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Abstract 

In this paper we present a theoretical analysis of the deterministic on-line 
Sum of Squares algorithm (SS) for bin packing introduced and studied experi- 



mentally in |CJK+9S], along with several new variants. SS is applicable to any 
instance of bin packing in which the bin capacity B and item sizes s{a) are inte- 
gral (or can be scaled to be so), and runs in time 0{nB). It performs remarkably 
well from an average case point of view: For any discrete distribution in which 
the optimal expected waste is sublinear, SS also has sublinear expected waste. 
For any discrete distribution where the optimal expected waste is bounded, SS 
has expected waste at most O(logn). In addition, we discuss several interest- 
ing variants on SS, including a randomized 0{nB log B)-t\'me on-line algorithm 
SS* , based on SS, whose expected behavior is essentially optimal for all discrete 
distributions. Algorithm SS* also depends on a new linear-programming-based 
pseudopolynomial-time algorithm for solving the NP-hard problem of determin- 
ing, given a discrete distribution F, just what is the growth rate for the optimal 
expected waste. This article is a greatly expanded version of the conference paper 
| CJK+O0|] . 
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1 Introduction 



In the classical one-dimensional bin packing problem, we are given a list L = (ai, an) 
of items, a bin capacity B, and a size s(aj) G (0, B] for each item in the list. We wish 
to pack the items into a minimum number of bins of capacity B, i.e., to partition 
the items into a minimum number of subsets such that the sum of the sizes of the 
items in each subset is B or less. Many potential applications, such as packing small 
information packets into somewhat larger fixed-size ones, involve integer item sizes, 
fixed and relatively small values of B, and large values of n. 

The bin packing problem is NP-hard, so research has concentrated on the design 
and analysis of polynomial-time approximation algorithms for it, i.e., algorithms that 
construct packings that use relatively few bins, although not necessarily the smallest 
possible number. Of special interest have been on-line algorithms, i.e., ones that must 
permanently assign each item in turn to a bin without knowing anything about the sizes 
or numbers of additional items, a requirement in many applications. In this paper we 
shall analyze the Sum of Squares algorithm, an on-line bin packing algorithm recently 
introduced in |PJK"^99[| that is applicable to any instance whose item sizes are integral 
(or can be scaled to be so), and is surprisingly effective. 



1.1 Notation and Definitions 

Let P be a packing of list L and for < h < B let Np{h) be the number of partially- 
filled bins in P whose contents have total size equal to h. We shall say that such a 
bin has level h. Note that by definition Np{0) = Np{B) = 0. We call the vector 
{Np{l), Np{2), . . . , Np{B - 1)) the profile of packing P. 

Definition 1.1 The sum of squares ss{P) for packing P is X]f=i Np{hY . 



The Sum-of- Squares Algorithm (SS) introduced in ||CJK"^99| is an on-line algorithm 



that packs each item according to the following simple rule: Let a be the next item 
to be packed and let P be the current packing. A legal bin for a is one that is either 
empty or has current level no more than B — s{a). Place a into a legal bin so as to yield 
the minimum possible value of ss{P') for the resulting packing P', with ties broken in 
favor of the highest level, and then in favor of the newest bin with that level. (Our 
results for SS hold for any choice of the tie-breaking rule, but it is useful to have a 
completely specified version of the algorithm.) 

Note that in deciding where to place an item of size s under SS, the explicit 
calculation of ss{P) is not required, a consequence of the following lemma. 

Lemma 1.2 Suppose an item of size s is added to a bin of level h of packing P, thus 
creating packing P' , and that Np{h + s) — Np{h) = d. Then 



ss{P')-ss{P) 



2rf+l, ifh = Oorh = B 
2d + 2, otherwise 
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Proof. Straightforward calculation using the facts that d = Np[h + s) when /i = 
and d = —Np{h) when h = B — s. ■ 

Thus to find the placement that causes the least increase in ss{P) one simply needs 
to find that i with Np{i) ^ that minimizes Np{i + s) — Np{i), < i < B — s under 
the convention that Np{0) and Np{B) are re-defined to be 1/2 and —1/2 respectively. 
We currently know of no significantly more efficient way to do this in general than to 
try all possibilities, so the running time for SS is 0{nB) overall. 

In what follows, we will be interested in the following three measures of L and P. 

Definition 1.3 The size s{L) of a list L is the sum of the sizes of all the items in L. 
Definition 1.4 The length \P\ of a packing P is the number of nonempty bins in P. 
Definition 1.5 The waste W{P) of packing P is J2h=l ' ^ = \P\ - s{L)/B. 

Note that these quantities are related since \P\ > s{L)/B and hence W{P) > 0. 

We are in particular interested in the average-case behavior of SS for discrete 
distributions. A discrete distribution F consists of a bin size B G Z"*", a sequence 
of positive integral sizes Si < S2 < ■ ■ ■ < sj < B, and an associated vector pp = 
{pi,P2, ■ ■ ■ ,Pj) of nonnegative rational probabihties such that J2j=iPj — 1- (Allowing 
for the possibility that some p^'s are will be notationally useful later in the paper.) In 
a list generated according to this distribution, the ith item Oj has size s{ai) = sj with 
probability pj, independently for each i > 1. We consider two key measures of average- 
case algorithmic performance. For any discrete distribution F and any algorithm A, 
let P^{F) be the packing resulting from applying A to a random list Ln{F) of n items 
generated according to F. Let OPT denote an algorithm that always produces an 
optimal packing. We then have 

Definition 1.6 The expected waste rate for algorithm A and distribution F is 

EW^{F)^E[W{P^{F))] . 
Definition 1.7 The asymptotic expected performance ratio for A and F is 

y 



ER^{F) = limsup 



1.2 Our results 

Let us say that a distribution F is perfectly packable if EW^^^{F) = o{n) (in which 
case almost all of the bins in an optimal packing are perfectly packed). By a result 
of Courcoubetis and Weber ||CW90|| that we shall describe in more detail later, the 



possible growth rates for EW^^'^{F) when F is perfectly packable are quite restricted: 
the only possibilities are Q {^/n) and 0(1). In the latter case we say F is not only 
perfectly packable but is also a bounded waste distribution. In this paper, we shall 
present the following results. 
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1. For any perfectly packable distribution F, the Sum-of-Squares algorithm is almost 
perfect: EW^^{F) = 0{^) [Theorem IJ. 



If F is a bounded waste distribution, then EW^^{F) is either 0(1) or G(logn) 
and there is a simple combinatorial property that F must satisfy for the first 
case to hold [Theorems p.4| and In particular, EW^^{F) = 0(1) for the 

discrete uniform distributions U{j, k}, j < k - 1, of IMMEj , |(J(Ja+9l| , ICCG+Oq , 



CJSW93| , |KRS98|| , which are the main discrete distributions studied to date. 



3. There is a simple 0(?T,i?)-time deterministic variant SS' on SS that has bounded 
expected waste for all bounded waste distributions and 0{^/n) waste for all per- 
fectly packable distributions [Theorem |3.10| . 



There is a linear-programming (LP) based approach that, in time polynomial in B 
and the number of bits required to describe the probability vector pp, determines 
whether F is perfectly packable. If so, it determines whether F is also a bounded 
waste distribution. If not, it computes the value of \im sup^^^^EW^^^ / n) [The- 
573| , |5.2| , and |5.6|| . Note that since the running time is polynomial in B 



orems 



rather than in log-B, the algorithm technically runs in pseudopolynomial time. 
We cannot hope for a polynomial time algorithm unless P = NP since the prob- 
lem solved is NP-hard | CCG"''O0[] . Moreover, all previous LP-based approaches 
took time exponential in B. 

For the case where F is not perfectly packable, there are lower bound examples 
and upper bound theorems showing that 1.5 < maxp ER^{F) < 3, and that for 
all lists L, we have SS{L) < 30PT{L), where A{L) is the number of bins used 
when algorithm A is applied to list L [Theorems ^]l| and |4.2| . 



For any fixed F, there is a randomized 0{nB)-time on-line algorithm SSp such 
that EW^^^{F) < EW^^^{F) + 0{^) and hence ER^p{F) = 1. Algorithm 
SSp is based on SS and, given F, can be constructed using the algorithm of (§) 
above [Theorem O]. 



7. There is a randomized 0(ni?)-time on-line algorithm SS* that for any F with bin 
capacity 5 has EIV„^^*(F) = <d{EW^P^{F)) and also EW^'^' (F) < EW^^^{F) + 
0(n^/^), the latter implying that ER^* (F) = 1. This algorithm works by learn- 
ing the distribution and using the algorithms of (^) and (^ [Theorem |6.2|| . 



SS can maintain its good behavior even in the face of a non-oblivious adversary 
who gets to choose the item size distribution at each step (subject to appropriate 



restrictions) [Theorems TA and ^2 



9. The good average case behavior of SS is at least partially preserved under many 
(but not all) natural variations on its sum-of-squares objective function and the 
accuracy with which it is updated. Moreover, there is a variant of SS that runs 
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in time 0{nlogB) instead of Q{nB) and has the same quahtative behavior as 
specified for SS' above in (3) [Theorems ^]T| through |8.10|| . 



Several of these resuhs were conjectured based on experimental evidence in 
which also introduced the main linear program of (^. This linear program turns out to 
be essentially equivalent to one previously introduced by Valerio de Carvalho in his arc 
flow model for bin packing |[Val99|| , but has not previously been adapted to questions 
of average case behavior. 



1.3 Previous results 



The relevant previous results can be divided into two classes: (1) results for practical 
algorithms on specific distributions, and (2) more general (and less practical) results 
about the existence of algorithms. We begin with (1). 

The average case behavior under discrete distributions for standard heuristics has 
been studied in jmOSj KJCC+91| , |CCC+00| , |CJSW93| , PSW97| , |KR598l . These papers 



concentrated on the discrete uniform distributions U{j, k} mentioned above, where the 
bin capacity B = k and the item sizes are 1, 2, . . . , j < fc, all equally likely. If j = k — 1, 
the distribution is symmetric and we have by earlier results that the optimal packing 
and the off-line First and Best Fit Decreasing algorithms (FFD and BFD) all have 
Q{^/rl) expected waste, as do the on-line First Fit (FF) and Best Fit (BF) algorithms 
IICCG+911 , PSW97|| . 

More interesting is the case when 1 < j < k — 2. Now the optimal expected waste is 
0(1) | ^^(JG+9l| , ^^(JG+UO| , |(JCa+02|| , and the results for traditional algorithms do not 
always match this. In ||CCG^91| it was shown that BFD and FFD have B(n) waste 
for U{6, 13}, and CJM"^ | identifies a wide variety of other U{j, k} with j < k — 1 for 



which these algorithms have linear waste. For the on-line algorithms FF and BF, the 
situation is no better. Although they can be shown to have 0(1) waste when j = 0{^/k) 
||U(JG^91|| , when j = k-2 ||A1V198| , |KR59q] , and (in the case of BF) for specified pairs 



{j, k) with k <1A ||CJSW93"|1 , for most values of (j', k) it appears experimentally that 
their expected waste is linear. This has been proved for BF and the pairs (8, 11) and 
(9, 12) ||CJSW9^| as well as all pairs j, k with j /k G [0.66, 2/3) when k is sufficiently 
large | [KMUU|| . In contrast, EW^^{U{j, k}) = 0(1) whenever j < k — 1. On the other 
hand, our current best implementation of basic SS runs in time Q{nB) compared to 
0(n log 5) for BF, 0{n + B\og^ B) for FFD, and 0{n + B\ogB) BFD [|CJM+|] . (The 



fastest known implementation of FF is 0(r2logr;,) and so FF is asymptotically slower 
than SS for fixed B.) 

Turning to less distribution-specific results, the first relevant results concerned off- 
line algorithms. In the 1960's, Gilmore and Gomory in ||GG61| , |GG63|| introduced a 
deterministic approach to solving the bin packing problem that used linear program- 
ming, column generation, and rounding to find a packing that for any list L with J or 
fewer distinct item sizes is guaranteed to use no more than OPT{L) + J—1 bins. Since 
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J < B for any discrete distribution, this implies an average-case performance that is 
at least as good as that specified for SS* in (7) of the previous section, and is in some 
cases better. However, although the approach often seems to work well in practice, its 
worst-case running time is conceivably exponential in B, since the basic LP involved 
in the approach has that many (implicit) variables. 

A packing obeying a similar bound can be constructed in time polynomial in B 
by using the ellipsoid method to solve the basic LP of (4) above and then greedily 
extracting a packing from the variables of a basic optimal solution, as explained in 
||ABD"^|| . A simplistic analysis of the running time yields a running time bound of 
0{n + {JBY'^log^n), which is linear but with an additive constant that for many 
distributions would render the algorithm impractical. However, if one uses the simplex 
method rather than the ellipsoid method to solve the LP's, this approach too seems to 
work well in practice. 

Theoretically the best approach along these lines is the off-line deterministic algo- 
rithm of Karmarkar and Karp that for any list L never uses more than OPT{L) + 
0(log^ J) bins and take time 0{n + J® log Jlog^ n). Although these guarantees are 
asymptotically stronger than those for the previous two approaches, the Karmarkar- 
Karp algorithm is substantially more complicated and inherently requires the per- 
formance of ellipsoid method steps. (This Karmarkar-Karp algorithm is closely re- 
lated to the more famous one from the same paper that guarantees a packing within 
OPT(L) + 0{\og^{OPT)) for all lists L, independent of the number of distinct item 
sizes, but for which the best current running time bound is 0{n^ log^ n).) 

For on-line algorithms, the most general results are those of Rhee and Talagrand 
Rhe8^ , |RT93a| , [RT93b|| . In |[RT93a|| , Rhee and Talagrand proved that for any dis- 



tribution F (discrete or not) there exists an 0(n log n) on-line randomized algorithm 
Ap satisfying EW^^{F) < EW^^^{F) + 0(V^log^/^n) and hence ER^{F) = 1. 
(For distributions with irrational sizes and/or probabilities, their results assume a real- 
number RAM model of computation.) This is a more general result than (^ above, and 
although the additive error term is worse than the one in (^ , the extra factor of log^''^ n 
appears to reduce to a constant depending only on B when F is a discrete distribution, 
making the two bounds comparable. Unfortunately, Rhee and Talagrand only prove 
that such algorithms exist. The details of the algorithms depend on a non-constructive 
characterization of F and its packing properties given in |[Rhe88|| . 

In [|RT93b|| , Rhee and Talagrand present a single (constructive) on-line randomized 
algorithm A that works for all distributions F (discrete or not) and has EW^{F) < 
EW^^'^{F)+0{y/n\og^^'^ n), again with the log^^^ n factor likely to reduce to a function 
of B for discrete distributions. Even so, for discrete distributions this algorithm is not 
quite as good as our algorithm SS*, which itself has EW^^* (F) < EW^^'^{F) +0{y/n) 
for all discrete distributions and in addition gets bounded waste for bounded waste 
distributions. Moreover, the algorithm of |[RT93b|| is unlikely to be practical since it 
uses the Karmarkar-Karp algorithm (applied to the items seen so far) as a subroutine. 

The fastest on-line algorithms previously known that guarantee an O (y/n) expected 
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waste rate for perfectly packable discrete distributions are due to Courcoubetis and We- 
ber, who used them in the proof of their characterization theorem in ||CW90|] . These 
algorithms are distribution-dependent, but for fixed F run in linear time. At each step, 
the algorithm must solve a linear program whose number of variables is potentially ex- 
ponential in S, but for fixed F this takes constant time, albeit potentially a large 
constant. Moreover, for bounded waste distributions, the Courcoubetis- Weber algo- 
rithms have EW^{F) = 0(1), whereas the Rhee-Talagrand algorithms cannot provide 
any guarantee better than O (y/n). On the other hand, the Rhee-Talagrand algorithms 
of [[RT93a| , |RT93b|| guarantee ER^{F) = 1 for all distributions, while Courcoubetis and 
Weber in [ UW90 | only do this for those distributions in which EW^^^{F) = O (y/n). 

Thus, although these earlier general approaches rival the packing effectiveness of 
SS and its variants, and in the case of the offline algorithms actually can do somewhat 
better, none are likely to be as widely usable in practice (certainly none of the online 
rivals will be), and none has the elegance and simplicity of the basic SS algorithm. 



1.4 Outline of the Paper 

The remainder of this paper is organized as follows. In Section 2 we present the details 
of the Courcoubetis- Weber characterization theorem and prove our result about the 
behavior of SS under perfectly packable distributions. In Section 3 we prove our 
results for bounded waste distributions. Section 4 covers our linear-programming-based 
algorithm for characterizing EW^^'^{F) given F. In Section 5 we discuss our results 
about the behavior of SS under linear waste distributions. In Section 6 we discuss our 
results about how SS can be modified so that its expected behavior is asymptotically 
optimal for such distributions. Section 7 presents our results about how SS behaves 
in more adversarial situations. Section 8 covers our results about the effectiveness of 
algorithms that use variants on the sum-of-squares objective function or trade accuracy 
in measuring that function for improved running times. We conclude in Section 9 with 
a discussion or open problems and related results, such as the recent extension of the 
Sum-of-Squares algorithm to the bin covering problem in |pjK01|| . 



2 Perfectly Packable Distributions 

In order to explain why the Sum-of-Squares algorithm works so well, we need first to 
understand the characterization theorem of Courcoubetis and Weber [|CW90|] , which 
we now describe. 

Given a discrete distribution F, a perfect packing configuration is a length- J vector 
b = 62, • • • , bj) of nonnegative integers such that ^^=1 bjSj = B. Such a configu- 
ration corresponds to a way of completely filling a bin with items from F. That is, if 
we take bi items of size Sj, 1 < i < J, we will precisely fill a bin of capacity B. Let 
Ai? be the rational cone generated by the set of all perfect packing configurations for 
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F, that is, the closure under rational convex combinations and positive rational scalar 
multiplication of the set of all such configurations. 

Definition 2.1 A rational vector x = {xi, . . . , xj) is in the interior of a cone A if and 
only if there exists an e > such that all nonnegative rational vectors y = {yi, . . . , yj) 
satisfying \x — y\ = J2i=i ~ Vil ^ ^ (^f^ ^■ 

Theorem (Courcoubetis- Weber ||C W90|| ). Let pp denote the vector of size prob- 
abilities {pi,P2, ■ ■ ■ ,Pj) for a discrete distribution F. 

(a) EW^^^{F) = 0(1) if and only if pp is in the interior of Ap. 

(b) EW^^^{F) = G {^/n) if and only if pp is on the boundary of Ap, i.e., is in Ap 
hut not in its interior. 

(c) EW^^^{F) = 0(n) if and only if pp is outside Ap. 

The Courcoubetis- Weber Theorem can be used to prove the following lemma, which 
is key to many of the results that follow: 

Lemma 2.2 Let F be a perfectly packable distribution with bin size B, P be an ar- 
bitrary packing into bins of size B , x be an item randomly generated according to 
F , and P' be the packing resulting if x is packed into P according to SS . Then 
E[SS{P')\P] < ss{P) + 2. 

Proof. The proof relies on the following claim. 



Claim 2.2.1 If F is a perfectly packable distribution with bin size B, then there is 
an algorithm Ap such that given any packing P into bins of size B, Ap will pack an 
item randomly generated according to F in such a way that for each bin level h with 
Np{h) >0,l<h<B — 1, the probability that Np{h) increases is no more than the 
probability that it decreases. 



Proof of Claim. The algorithm Ap depends on the details of the Courcoubetis- Weber 
Theorem. Since F is perfectly packable, pp must be in Ap and so there must exist 
some number m of length- J nonnegative integer vectors hi and corresponding positive 
rationals satisfying 

J 

^{bij -Sj) = B, 1 < z < m (2.1) 
i=i 

m 

^(a,-6,„,) = p„ 1<J<J (2.2) 

i=l 
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Now since the and pj are all rational, there exists an integer Q such that Q ■ Oi and 
Q ■ Pj are integral for all i and j. Consider the ideal packing P* which has Qai copies 
of bins of type 6j. We will use P* to define Ap- Note that by ( |2.2| ) P* contains Qpj 
items of size j, I < j < J, and hence a total of Q items. Let Lp = {xi, X2, ■ ■ ■ ,xq} 
denote the Q items packed into P*, and denote the bins of P* as Yi, Y2, . . . , Y\p*\. 

Now let P be an arbitrary packing of integer-size items into bins of size B. We 
claim that for each bin Y of the packing P*, there is an ordering yi, 1/2, ... , y\Y\ of the 
items contained in Y and a special threshold index last{Y) < \Y\ such that if we set 
Si = J2]=i ^iVj)^ < i < \Y\, then the following holds: 

1. P has partially filled bins with each level 6*1 < ^2 < • ■ • < Siast{Y)- 

2. P has no partially filled bin of level Siast{Y) + s{yi) for any i > last{Y). 

That such an ordering and threshold index always exist can be seen from Figure 
which presents a greedy procedure that, given the current packing P, will compute 
them. Assume we have chosen such an ordering and threshold index for each bin in 
P*. Note that S\y\ = B for all such bins F, since each is by definition perfectly packed. 

Our algorithm Ap begins the processing of an item a by first randomly identifying 
it with an appropriate element r(a) G Lp. In particular, if a is of size Sj, then r(a) is 
one of the Q ■ pj items in Li? of size sj, with all such choices being equally likely. Note 
that this implies that for each i, 1 < i < Q, the probability that a randomly generated 
item a will be identified with is 1/Q. 



1. 


Let the set U of as-yet-unordered items initially 




be set to Y and let S" = be the initial total size 




of ordered items. 


2. 


While f/ 7^ and last{Y) is undefined, do the following: 




2.1 If there is an item x in U such that P has 




a partially filled bin of level S + x 




2.1.1 Choose such an x, put it next in the ordering, 




and remove it from U 




2.1.2 Set S = S + s{x). 




2.2 Otherwise, set last(Y) to be the number of 




items ordered so far and exit While loop. 


3. 


Complete the ordering by appending the remaining 




items in U in arbitrary order. 



Figure 1: Procedure for ordering items in bin Y given a packing P 
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Having chosen r(a), we then determine the bin into which we should place a as 
follows. Suppose that in P*, item r(a) is in bin Y and has index j in the ordering of 
items in that bin. 

(i) If j = 1, place a in an empty bin, creating a new bin with level s{a) = Si. 

(ii) If 1 < j < last{Y), place a in a bin with level Sj^i, increasing its level to Sj. 

(iii) If j > last{Y), place a in a bin of size Siast{Y) (or in a new bin if last{Y) = 0). 

For example, suppose that the items in Y, in our constructed order, are of size 
2, 3, 2, and 4 and last{Y) = 2. Then = 2, ^2 = 5, ^3 = 7, ^4 = 5 = 11, 
Np{2), Np{5) > 0, and Np{7) = Np{9) = 0. If r(a) G Y, then it is with equal 
probability the first 2, the 3, the second 2, or the 4. In the first case it starts a new bin, 
creating a bin of level 2 and increasing Np{2) by 1. In the second it goes in a bin of level 
2, converting it to a bin of level 5, thus decreasing Np{2) by 1 and increasing Np{5) by 
1. In the third and fourth cases it goes in a bin of level 5, converting it to a bin of level 
7 or 9, depending on the case, and decreasing Np{5) by 1. Thus when r(a) G Y, the 
only positive level counts that can change are those for h G {2, 5} = {^i, 5*2 = Siast{Y)}, 
counts can only change by 1, and each count is at least as likely to decline as to increase. 

More generally, for any bin F in P*, if a is randomly generated according to F and 
r(a) G Y, then by the law of conditional probabilities r(a) will take on each of the 
values Vi, 1 < i < \Y\ with probability p = l/\Y\. Thus if r(a) E Y the probability 
that the count for level Si increases equals the probability that it decreases when 
1 < i < last{{Y). The probability that the count for Siast{Y) decreases is at least as 
large as the probability that it increases (greater if last{Y) < \Y\ — 2). And for all 
other levels with positive counts, the probability that a change occurs is 0. Since this 
is true for all bins Y of the ideal packing P*, the Claim follows. ■ 

Claim |2.2| is used to prove Lemma ^]2| as follows. Note that the claim implies a 
bound on the expected increase in ss{P) when a new item is packed under Ap. For 
any level count a; > 0, the expected increase in ss{P) given that this particular count 
changes is, by the claim, at most 



More trivially, the expected increase in ss{P) given that a 0-count changes is also at 
most 1. Since a placement changes at most two counts, this means that the expected 
increase in ss{P) using algorithm Ap is at most 2. Since SS explicitly chooses the 
placement of each item so as to minimize the increase in ss{P), we thus must also have 
that the expected increase in ss{P) under SS is at most 2 at each step. ■ 

Lemma |2?^ is exploited using the following result. 
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Lemma 2.3 Suppose P is a packing of a randomly generated list Ln{F), where F is 
a discrete distribution with bin size B and n > 0. Then 

E[W{P)] < ^{B-l)E[ss{P)]. 

Proof. For 1 <i <n\ei Ci = ^f^ZiP[^p{h) = i], i.e., the expected number of levels 
whose count in P equals i. Then ^2^=1 Ci = B — 1 and 



B~l 



E[ss{P)] = J2E[Np{h)] = J2C^■^' 

h=l i=l 

We now apply the Cauchy-Schwartz inequality, which says that 
Let Xi = \fCi and yi = i\fCi^ 1 < i < n. We then have 

EC. ' £ EC- Ec-^ 



(2,3) 



J=l 



i=l 



,i=l 



Taking square roots and using (p.3[), we get 



E 



B-l 



h=l 



<^iB-l)E[ssiP)]. 



(2.4) 



Since no partially full bin has more than [B — 1)/B < 1 waste, the claimed result 
follows. ■ 



Theorem 2.4 Suppose F is a discrete distribution satisfying EW^^^{F) 
Then EW,f{F) < V2^. 



Proof. By Lemma and the linearity of expectations, we have 

E[ssiP^'iF))] < 2n. 



The result follows by Lemma 2.3. 



3 Bounded Waste Distributions 

In order to distinguish the broad class of bounded waste distributions under which SS 
performs well, we need some new definitions. If F is a discrete distribution, let Up 
denote the set of sizes with positive probability under F. 
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Definition 3.1 A level h, 1 < h < B — 1, is a dead-end level for F if there is some 
collection of items with sizes in Up whose total size is h, hut there is no such collection 
whose total is B — h. 

In other words, if /i is a dead-end level then it is possible to pack a bin to level 
h with items from Up-, but once such a bin has been created, it is impossible to fill 
it completely. Note that the dead-end levels for F depend only on Up and can be 
identified in time 0{\Uf\B) by dynamic programming. 

Observation 3.2 For future reference, note the following easy consequences of the 
definition of dead-end level. 



a) The algorithms Ap of Claim ^.2.1\ in the proof of Lemma \2.i\ never create bins 



that have dead-end levels. (This is because the levels of the bins they create are 
always the sums of item sizes from a perfectly packed bin.) 

(b) If F is a perfectly packable distribution, then for no Sj G Up is Sj a dead-end 
level. ( Otherwise, no bin containing items of size Sj could be perfectly packed. 
Since the expected number of such bins in an optimal packing is at least npj/B, 
this means that the expected waste would have to be at least npj/B"^ and hence 
linear, contradicting the assumption that F is a perfectly packable distribution.) 

(c) No distribution with 1 G Up can have a dead-end level, so that in particular the 
U{j, k} do not have dead-end levels. 

A simple example of a distribution that does have dead-end levels is any F that has 
B = 6 and Up = {2, 3}. Here 5 is a dead-end level for F while 1,2,3,4 are not. There 
is a sense, however, in which this distribution is still fairly benign. 

Definition 3.3 A level h is nontrivial for a distribution F if there is some list L with 
item sizes from Up such that the SS packing P of L has Np{h) > 1. 

It is easy to verify that there are no nontrivial levels, dead-end or otherwise, in the 
above B = Q example. 



We shall divide this section into three parts. In subsection 3.1 we show that SS has 



bounded expected waste for bounded waste distributions with no nontrivial dead-end 



levels. In subsection 3.2 we show that a simple variant on SS has bounded expected 



waste for all bounded waste distributions. In subsection 3.2 we characterize the be- 



havior of SS for bounded waste distributions that do have nontrivial dead-end levels. 
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3.1 A bounded expected waste theorem for SS 

Theorem 3.4 If F is a bounded waste distribution with no nontrivial dead-end levels, 
then EW^^{F) = 0(1). 



To prove this resuh we rely on the Courcoubetis- Weber Theorem, Lemma p.2|, and 
the following specialization of a result of Hajek [[Haj82|| . 

Hajek's Lemma. Let S be a state space and letTk, k > 1, be a sequence of functions, 
where J^k rnaps S^~^ to probability distributions over S . Let Xi, X2, ... be a sequence of 
random variables over S generated as follows: Xi is chosen according to J^i(-) and Xk 
is chosen according to Tk{Xi, . . . ,Xk-i). Suppose there are constants 6 > 1, A < 00, 
D > Q, and 7 > and a function cj) from S to [0, 00) such that 

(a) [Initial Bound Hypothesis] . E \h'^^^^'^] < 00 . 

(b) [Bounded Variation Hypothesis]. For all N >l, |0(XAr+i) - 0(Xjv)| < A. 

(c) [Expected Decrease Hypothesis]. For all N >1, 

E[<P{Xn+i) - <j){X^)\<f){X^) >D]< -7. 

Then there are constants c > 1 and T > such that for all N > 1, E [c'^^-''^^)] < T. 

Note that the conclusion of this lemma implies that there is also a constant T' 
such that E[(f){Xi\f)] < T' for all N. A weaker version of the lemma was used in the 
analyses of the Best and First Fit bin packing heuristics in | AM9S , CJSW93 , |KRS98|| . 
The added strength is not needed for Theorem p.4|, but will be used in the proof of 



Theorem 3.11 



We prove Theorem by applying Hajek's Lemma with the following interpreta- 
tion. The state space S is the set of all length- (5 — 1) vectors of non- negative integers 
X = {xi,X2, . . . ,xb-i), where we view x as the profile of a packing that has Xi bins 
with level i, 1 < i < B — 1. Xq is then the profile of the empty packing and Xj+i is 
the profile of the packing obtained by generating a random item according to F and 
packing it according to SS into a packing with profile Xj. The potential function is 



[X 



\ 



B-l 



i=l 



Note that if the hypotheses of Hajek's Lemma are satisfied under this interpretation, 
then the lemma's conclusion would say that there is a T' such that for all N, 



E 



\ 



B-l 

E 

i=l 



X 



< T' 



13 



which imphes that E[xN,i\ is bounded by T' as well, 1 < i < B — 1. Thus the expected 
waste is less than the constant BT' and Theorem |3.4| would be proved. 

Hence all we need to show is that the three hypotheses of Hajek's lemma apply. The 
Initial Bound Hypothesis applies since the profile of an empty packing is all O's and 
hence 0(Xo) = 0. The following lemma implies that Bounded Variation Hypothesis 
also holds. 

Lemma 3.5 Let x he the profile of a packing into bins of size B, and let x' he the 
profile of the packing obtained from x by adding an item to the packing in any legal 
way. Then 

|0(x')-0(a:)|<l 

Proof. Consider the case when > 4>{x) and suppose that i is the level whose 

count increases when the item is packed is level i. We have 



2xi + 1 



< 



^y(j){xy + 2Xi + l + (f){x) 

2xi + l 



^Jxl + 2xi + 1 + 



A similar argument handles the case when < ■ 

To complete the proof of the theorem, we need to show that the Expected De- 
crease Hypothesis of Hajek's Lemma applies. For this we need the following three 
combinatorial lemmas. 

Lemma 3.6 Suppose y be any number and a > 0. Then 

y^-a^ 
y — a < . 

Proof. Note that y — a = (y^ — a^)/{y + a), and then observe that no matter whether 
y > a or y < a, this is less than or equal to (y^ — a^)/2a. ■ 

Lemma 3.7 Let F be a distribution with no nontrivial dead-end levels and let P be 
any packing that can be created by applying SS to a list of items all of whose sizes are 
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in Up- If X is the profile of P and (j){x) > 2B^^'^, then there is a size s E Up such that 
if an item of size s is packed by SS into P , the resulting profile x' satisfies 



(l){x'f<<p{xf- 



53/2 



Proof. Suppose x is as specified and let h be the index for a level at wliicli x takes on 
its maximum value. It is easy to see that 



Xh>(t){x)I^B. (3.5) 

Thus Xh > 25 > 1 and so by definition h cannot be a dead-end level for F. Hence 
there must be a sequence of levels h = io < ii < ■ ■ ■ < im = B , m < B , such that for 
1 < i < m, ii — ii^i G Up. Taking a;^ = by convention, we have 

m—l 

Xh = ^{xe, - xe,^,) . (3.6) 

Let g, < g < m be an index which yields the maximum value A for xg^ — xe-^-^ , and 
let s = ig+i — ig. Then by ( |3.6|) we have A > Xh/m > Xh/B > (j){x) / B^^"^ , where the 



last inequality follows from ( p.5|) . By Lemma p..2| this means that if an item of size s 
arrives, 0(x)^ must decline by at least 

/0(x) \ (/)(x) + 253/2 ^^^^ 



\^53/2 J - - Q3/2 



as claimed. 



Lemma 3.8 Let F be a bounded waste distribution with Up = {si, S2, ■ ■ ■ , sj} . For 
each i, 1 < i < J and e > 0, let F[i, e] be the distribution which decreases Pi to 
p[ = {pi — e)/(l — e) and increases all other probabilities pj to p'j = Pj/{1 — e). Then 
there is a constant eo > such that F[i, e] is a perfectly packable distribution for all i, 
I < i < J, and e, < e < cq. 

Proof. Since F is a bounded waste distribution and Pi > 0, 1 < i < J, this follows 
from the Courcoubetis- Weber theorem, part (a). ■ 

We can now prove that the Expected Decrease Hypothesis of Hajek's Lemma ap- 
plies, which will complete the proof of the Theorem |3.4| . Let F be a bounded waste 
distribution with no nontrivial dead-end levels, and let eo be the value specified for F by 
Lemma |3l8| . Without loss of generality we may assume that eo < 2. Let P be a packing 
as specified in Lemma 3/7 but with profile x satisfying 0(x) > AB^^'^/eo > 2B^/'^. Let 



i be the index of the size s G Up whose existence is proved in Lemma p.7| , and let Fi 
be the distribution that always generates an item of size Sj. 
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Consider the two-phase item generation process that first randomly chooses between 
distributions Fi and -F[i,eo], the first choice being made with probabihty eo and the 
second with probabihty 1 — eo- It is easy to see that this process is just a more 
comphcated way of generating items according to distribution F. Now consider what 
happens when this process is used to add one item to packing P. If F^ is chosen, then 
by Lemma |3?7| , the value of cfP' declines by at least (j){x) / B'^^'^ . If F[i,eo] is chosen, the 
expected value of 0^ increases by less than 2 by Lemma |2.2| and the fact that F[i, eo] 
is a perfectly packable distribution (Lemma p.8| ). Thus if x' is the resulting profile, we 
have by applying Lemma for a = (j){x) and taking expectations 



E\ 



(x)] < 



eo)(2) 



1 



2(f){x) 



eo 



fi3/2 J \24>{X 



< 



< 



1 



eo 



(t){x) 253/2 

453/2 



since > AB^^'^/eQ. Thus the Expected Decrease Hypothesis of Hajek's Lemma 
holds with D = AB'^^'^/t^ and 7 = eo/4i?3/2^ and so Hajek's Lemma applies. Thus 
EW^^{F) = 0(1), the conclusion of Theorem 



3.2 Improving on SS for bounded waste distributions 

Unfortunately, although SS has bounded expected waste for bounded waste distribu- 
tions with no nontrivial dead-end levels, it doesn't do so well for all bounded waste 
distributions. Consider the distribution F with 5 = 9, J = 2, si = 2, S2 = 3, and 
Pi = P2 = 1/2. It is easy to see that F is a bounded waste distribution, since 3's by 
themselves can pack perfectly, and only one 3 is needed for every three 2's in order 
that the 2's can go into perfectly packed bins. Note, however, that 8 is a nontrivial 
dead-end level for F, so Theorem [3^ does not apply. In fact, EW^^{F) = Q(\ogn), as 
the following informal reasoning suggests: It is likely that somewhere within a sequence 
of nlogn items from F there will be Q{\ogn) consecutive 2's. These are in turn likely 
to create ^(logra) bins of level 8, and hence, since 8 is a dead-end level, 0(log?7,) waste. 

Fortunately, this is the worst possible result for SS and a bounded waste distribu- 
tion, as we shall see below in Theorem |3.11| . First, however, let us show how a simple 



modification to SS yields a variant with the same running time that has 0(1) expected 
waste for all bounded waste distributions. 

Like SS, this variant (5*5") is on-line. It makes use of a parameterized variant SSo 
on the packing rule of SS, where D is a set of levels. In SSd, we place items so as to 
minimize ss{P) subject to the constraint that no bin with level in D may be created 
unless this is unavoidable. In the latter case we start a new bin. SS' works as follows. 
Let U be the set of item sizes seen so far and let D{U) denote the set of dead-end 
levels for U. (Initially, U is the empty set.) Whenever an item arrives, we first check 
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if its size is in U. If not, we update U and recompute D{U). Then we pack the item 
according to SSd{u)- A first observation about 5*5" is the following. 

Lemma 3.9 If F is a perfectly packable distribution, then SS' will never create a 
dead-end level when packing a sequence of items with sizes in Up- 

Proof. By Observation |3.2| (b), starting a bin with an item whose size is in f/^ can 
never create a dead-end level for Up- On the other hand, if 5*5" puts a item in a 
partially full bin, it must by definition be the case that the new level is not a dead-end 
level for U. Thus, since the new level is attainable using items whose sizes are in U, 
the resulting gap must be precisely tillable with items whose sizes are in ?7 C f/p. Thus 
the new level is not a dead-end level for Up either. ■ 

Theorem 3.10 

(i) If F is a perfectly packable distribution, then EW^^' (F) = 0{y/n). 

(ii) If F is a bounded waste distribution, then EW^^'{F) = 0(1). 

Proof. We begin by bounding the expected number of items that can arrive before 
we have seen all item sizes in Up. Assume without loss of generality that Up = 
{si, S2, ■ ■ ■ , Sj}. The probability that the ith item size does not appear among the 
first h items generated is (1 —pi)^. Thus, if we let Pmin = minjpj : I < i < J}, the 
probability that we have not seen all item sizes after the hth item arrives is at most 

J 

^ (1 - Pif < J (1 - Pmin)'' 
i=l 

Let t be such that J(l —pminY < 1/2. Then for each integer m > 0, the probability 
that all the item sizes have not been seen after mt items have arrived is at most 1/2"^. 
Thus if M is the number of items that have arrived when the last item size is first seen, 
we have that for each m > 0, the probability that M G {mt, (m + l)t] is at most 1/2*^. 

For (i), note that if P is the packing that exists immediately after the last item size 
is first seen, then ss{P) < and 

E[ss{P)] < 5^ ((m + l)tf .p[Me {mt, {m + l)t]]) < = 12t' 

m=0 m=0 

which is a constant bound depending only F. After all sizes have been seen, SS' 
reduces to SS£,(^UF)y follows from Observation |3]^(a) that Lemma |2.2| applies to 

the latter. We thus can conclude that for any n the packing P„ satisfies 

E [ss{Pn)] < I2t^ + 2n 
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which by Lemma |2.3| imphes that EW^^' (F) = 0{y/n), so (i) is proved. 

The argument for (ii) mimics the proof of Theorem p.4| . Using the same potential 
function (p we show that Hajek's Lemma apphes when SS is replaced by SSd{Uf)^ ^ 
is a bounded waste distribution, and the initial state x is taken to be the profile of the 
packing P that exists immediately after the last item size is first seen by 5*5". 

To see that the Initial Bound Hypothesis is satisfied, we must show that there exists 
a constant 6 > 1 such that E \h'^^^^~\ is bounded. To prove this, let M be the number 



of items in packing P. It is immediate that 4>{x) = yYlf=i ^1 — Thus if we take 
h = 2^/(^*) and exploit the analysis used for (i) above we have 



°° 1 °o 9(m+l)/2 °o 1 9 

m=0 ^ 



2{m+l)/2 

m=0 ~ m=0 

Thus the Initial Bound Hypothesis is satisfied. The Bounded Variation Hypothesis 



again follows immediately from Lemma To prove the Expected Decrease Hypoth- 
esis, we need the facts that Lemmas and ^T7| hold when SS is replaced by SS£)(^Upy 



We have already observed that Lemma ^]2| holds. As to Lemma |3.7| , the properties 



of SS were used in only two places. First, we needed the fact that SS could never 
create a packing where the count for a dead-end level exceeded 1, an easy observation 
there since we assumed there were no nontrivial dead-end levels. Here there can be 
nontrivial dead-end levels, but this is not a problem since by Lemma O SS' can never 
create a packing where the count for a dead-end level is nonzero. 

The other property of SS used in proving Lemma |3.7| was simply that, in the terms 
of the proof of that lemma, it could be trusted to pack an item of size s = iq+i — iq in 
such a way as to reduce ss{P) by at least as much as it would be reduced by placing 
the item in a bin of level ig. SS£){Uf) "^ill clearly behave as desired, since level iq+i, as 
it is constructed in the proof, is not a dead-end level, and so bins of level iq are legal 
placements for items of size £g+i — iq under SSd{Uf)- 

We conclude that Lemma ^l7| holds when SSd{Up) replaces SS, and so the Expected 
Decrease Hypotheses of Hajek's Lemma is satisfied. Thus the latter Lemma applies, 
and the proof of bounded expected waste can proceed exactly as it did for SS. ■ 



3.3 The worst behavior of SS for bounded waste distributions 

Theorem 3.11 If F is a bounded waste distribution that has nontrivial dead-end levels, 
then EW^'^iF) = Q{\ogn). 

We divide the proof of this theorem into separate upper and lower bound proofs. 
These are by a substantial margin the most complicated proofs in the paper, and 
readers may prefer to skip this section on a first reading of the paper. None of the later 
sections depend on the details of these proofs. 



18 



3.3.1 Proof of the O(logn) Upper Bound 



For this resuh we need to exploit more of the power of Hajek's Lemma (which surpris- 
ingly is used in proving the lower bound as well as the upper bound). We will also need 
a more complicated potential function. Let Vp denote the set of dead-end levels for F 
and let Cp denote the set of levels that are not dead-end levels for F. We shall refer to 
the latter as live levels in what follows. For a given profile x, define td{x) = 



X' 



Note that 



and tl{x) = 
ip must satisfy two key properties. 



[X 



a/ td{x) + tl{x). Our new potential function 



1. Hajek's Lemma applies with the potential function ip and, as before, Xi repre- 
senting the profile after SS has packed i items generated according to F. 



2. For any live level h, 



ip{x) > \frdx) > Xh- 



(3.7) 



Let us first show that the claimed upper bound will follow if we can construct a 
potential function ip with these properties. Since Hajek's Lemma applies, there exist 
constants c > 1 and T > such that for all > 0, 



E [c^^^^^ < T. 



{3.i 



We can use (|3^ ) to separately bound the sums of the counts for live and dead levels. 
For each live level h, the component Xn,h of the final packing profile Xn satisfies 
Xn,h < 'ipiXn) < c'^^'^" V loge c, and so we have 



E 



< E 



B- 



logeC 



< 



BT 
loggC 



0(1) 



(3.9) 



In other words, the expected sum of the counts for live levels is bounded by a constant. 

To handle the dead-end levels, we begin by noting that (|3^ ) also implies that for 
all and all a > 1, 

1 



>aT] <- 



a 



so if we take logarithms base c and set a = n^/T we get 



P[V'(X^) >21og,n] < 



T 



(3.10) 



Say that a placement is a major uphill move if it increases ss{P) by more than 
Alog^n + 1. By Observation p.2| (b) and (|3.7|) , we know that whenever an item is 
generated according to F and packed by SS, one option will be to start a new bin with 
a live level and hence, no matter where the item is packed, the increase in ss{P) will 
be bounded by 2tp{XN) + 1. Using ( p.lO| ), we thus can conclude that at any point in 
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the packing process, the probabihty that the next placement is a major uphill move is 
at most T jv? . Thus, in the process of packing n items, the expected number of major 
uphill moves is at most T jn by the linearity of expectations. 

Now let us consider the dead-end levels. Suppose the count for dead-end level h is 
2i?(log^n + 1) or greater and a bin h with level less than h receives an additional item 
that brings its level up to h. We claim that bin 6, in the process of attaining this level 
from the time of its initial creation, must have at one time or another experienced an 
item placement that was a major uphill move. 

To see this, let us first recall the tie-breaking rule used by SS when it must choose 
between bins with a given level for packing the next item. Although the rule chosen 
has no effect on the amount of waste created, our definition of SS specified a particular 
rule, both so the algorithm would be completely defined and because the particular 
rule chosen facilitates the bookkeeping needed for this proof. The rule says that when 
choosing which bin of a given level h to place an item in, we always pick the bin which 
most recently attained level h. In other words, the bins for each level will act as a 
stack, under the "last-in, first-out" rule. Now consider the bin h mentioned above. In 
the process of reaching level it received less than B items, so it changed levels fewer 
than B times. Note also that by our tie-breaking rule above, we know that every time 
the bin left a level, that level had the same count that it had when the bin arrived at 
the level. Thus at least one of the steps in packing bin h must have involved a jump 
from a level i to a level j such that Np{j) > Np{i) + 2(logg n + 1). By Lemma |r]2| this 
means that the move caused ss{P) to increase by at least 4(log(, + 1) + 1 > 4 log^ n + 1 
and hence was a major uphill move. We conclude that 



E 



J2 iXn,h - 25(log,n + l)) 



< [((^n,/. - 2B{\og,n+ 1)) : X„,^ > 25(log,n + 1) 

heVp 

T 

< E [Number of major uphill moves] < 



n 



and consequently 



E 



heVp 



T 

< 2B^(\og^n + 1) + - = 0(logn 

n 



(3.11) 



for fixed E. Combining ( |3.9| ) with ( ^.111 ), we conclude that 



EW^^iE) < E 



.heVp 



+ E 



.heCf 



O(logn). 



Thus all that remains is to exhibit a potential function ip that obeys ( p.7| ) and 
the three hypotheses of Hajek's Lemma. Our previous potential function (f){x) = 
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a/ti,(x) + td{x) obeys ( p.7| ) and the Initial Bound and Bounded Variation Hypothe- 
ses. Unfortunately, it doesn't obey the Expected Decrease Hypothesis for all bounded 
waste distributions F with nontrivial dead-end levels. There can exist realizable pack- 
ings in which the count for the largest dead-end level is arbitrarily large (and hence so 
is 0(a;)), and yet any item with size in Up will cause (f){x) to increase. One can avoid 
such obstacles by taking instead the potential function ijj to be \/tl{x), the variant 
on that simply ignores the dead-end level counts. This function unfortunately fails 
to obey the Expected Decrease Hypothesis for a different reason. There are relevant 
situations in which any item with a size in Up will either cause an increase in tl{x) or 
else go in a bin with a dead-end level and thus leave tl{x) unchanged. 

Thus our potential function must somehow deal with the effects of items going into 
dead-end level bins. Let us say that a profile x' is constructible from a profile x under 
F if there is a way of adding items with sizes in f/^ to dead-end level bins of a packing 
with profile x so that a packing with profile x' results. Let 

Tq{x) = min{r£)(x') : x' is constructible from x under F} (3.12) 

Note for future reference that ro{x) can never decrease as items are added to the 
packing. Now let 

roix) = td{x) - tq{x) (3.13) 

Thus r£)(a;) is the amount by which we can reduce T£)(x) by adding items with sizes in 
Up into bins with dead-end levels. Our new potential function is 

^(S) = \/TL{x) + rD{x) (3.14) 

Note that since we must always have r£,{x) > 0, we have ijj{x) > ^/rijxj and so 
( p.7| ) holds for ip. It remains to be shown that Hajek's Lemma applies to ^p. This is 
significantly more difficult than showing it applies to when F has no dead-end levels. 

First we prove a technical lemma that will help us understand the intricacies of the 
rp,{x) part of our potential function ip. Recall that if r£,{x) = t, then there is some list 
L of items with sizes in Up that we can add to the dead-end level bins of a packing 
with profile x to get to one with a profile y such that Toiy) = td{x) — t, and no such 
list of items can yield a profile y' with Tp,{y') < T£,{x) — t. In what follows, we will use 
an equivalent graph-theoretic formulation based on the following definition. 

Definition 3.12 A reduction graph G for F is a directed multigraph whose vertices 
are the dead-end levels for F and for which each arc {h,i) is such that i — h can be 
decomposed into a sum of item sizes from Up. Such a graph G is applicable to a 
profile X if outdegreecii) < Xi for all dead-end levels i. The profile G[x\ derived from 
applying G to x is the vector y that has yi = Xi + indegreecii) — outdegreecii) for 
all dead-end levels and yi = Xi for all live levels. We say that G verifies t for x if 
td{x) - Tr,{G[x]) > t. 
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Note that r£)(x) equals the maximum t verified for x by some apphcable reduction 
graph G. The hst L corresponding to G, i.e., the one that can be added to x to obtain 
G[x], is a union of sets of items of total size i — h ioi each arc [h, i) in G. 

Lemma 3.13 Let G be a reduction graph with the minimum possible number of arcs 
that verifies r£,{x) for x. Then the following three properties hold: 

(i) No vertex in G has both a positive indegree and a positive outdegree. 

(a) Suppose that the arcs ofG are ordered arbitrarily as ai, 02, ... , am, that we induc- 
tively define a sequence of profiles |/[0] = x, . . . y[m] by saying that y[i + 1] is 
derived by applying the graph consisting of the single arc ai to y[i — l], 1 < i < m, 
and that we define A[i] = TD{y[i — I]) — TD{y[i]), 1 < i < m. Then 

m 

^^A[i] = r^ix) and (3.15) 

i=l 

A[z] > 0, 1 <i<m. (3.16) 



(Hi) G contains fewer than ip{x) copies of any arc {h,i). 



Proof. If (i) did not hold, there would be a pair of arcs {h,i) and {i,j) in G for 
some h,i,j. But note that then the graph G' with these two arcs replaced by {h,j) 
would also verify r^ix) for x, and would have one less arc, contradicting our minimality 
assumption. 

For (ii), equality ( |3.15| ) follows from a collapsing sum argument and the fact that 
y[m] = G[x]. The proof of ( p.l6 ) is a bit more involved. Suppose there were some 



k such that A [A;] < 0. We shall show how this leads to a contradiction. Consider 
the result of deleting arc = {h,j) from G, thus obtaining new graph G' and new 
sequences y[i]' and A^, 1 < i < m — 1. We will show that G' also verifies r£)(x) for x, 
contradicting our minimality assumption. 

Note that y[i]' = y[i], 1 < i < k, and hence A[i]' = A[i] for 1 < i < k. Thereafter 
the only difference between y[i] and y[i]' is that y[i]'fi = y[i + l]h + 1 and y[i]'j = 
y[i + l]j — 1. Suppose i > k and that = (r, q). Note that by (i), r 7^ j and q h. 
Thus we have y[i]'j, > y[i-\-l]r and y[i]'g < y[i + l]g and by Lemma (noting that A[i] 
as defined is —1 times the quantity evaluated in that lemma), 

A[i]' = 2 {y[i]', - - 1) > 2 {y[z + 1], - y[z + 1], - 1) = A[z + 1]. 

Thus we have by (p.l5|) 



m— 1 m m 

i=l i=l i=l 
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and so G' verifies roix) for x. Since G' has one less arc tlian G, tliis violates our 
assumption about tfie minimality of G and so yields our desired contradiction, thus 
proving ( ^.161 ). 

Finally, let us consider (iii). Suppose there were ipi^x) copies of some arc in 
G. By (ii) we may assume that these are arcs ai,a2, . . . ,cl^(x), and that each yields 
an improvement in T£,. Thus when the last is applied, the count for level h must have 
been at least 2 more than the count for level j, and inductively, when arc a^(^x)+i-i was 
applied, the difference in counts had to be at least 2i. Now by Lemma pT^, if the count 
for level h exceeds that for level j by 5, then the decrease in r/j caused by applying the 
arc is 25 — 2. Thus by (ii) we have 

i){xf > roix) > ^(4i - 2) = 2ij{xf, 

i=l 

a contradiction. Thus (iii) and Lemma p.l3| have been proved. ■ 

Now let us turn to showing that Hajek's Lemma applies when ip plays the role of 
0. Since the initial state is the empty packing, for which iIj{x) = 1, the Initial Bound 
Hypothesis is trivially satisfied. For the Bounded Variation Hypothesis we must show 
that there is a fixed bound A on {iplx') — 'ip{x)\, where x is any profile that can occur 
with positive probability in an SS packing under F and x' is any profile that can be 
obtained by adding an item with size s E Up to a. packing with profile x using SS. We 
will show this for A = 10-B. We may assume without loss of generality that -B > 2, as 
otherwise EW^^{F) = for all n. 

There are two cases, depending on whether iIj{x') > iplx). First suppose ip{x') > 
ip{x). By Lemma ^]6| it suffices to prove that iplx')"^ — ip{xY < 2A%p{x) = 20BiIj{x). 
By Observation |3.2| (b) we know that s is not a dead-end level and hence by (|3.7|) 
Xs < ip{x). Thus by the operation of SS and the fact that to{x) cannot decrease, the 
increase in ?/'(a;)^ is at most [xg + 1)^ — = 2xs + 1. If Xg = 0, this is clearly less than 
105. Otherwise, we have ipiix) > tl{x) > 1, and so 2xs + 1 < ^iipix) < 20Bi/j{x), as 
desired. 

Suppose on the other hand that iIj{x') < ip{x), a significantly more difficult case. 
We need to show that iIj{x) — iplx') < A = lOB. Lemma |3.6| again applies, but now 
requires that we show iplx)'^ — iplx'Y < 2AiIj{x'), where the bound is in terms of the 
resulting profile x' rather than the initial one x. To simplify matters, we shall first show 
that the former is within a constant factor of the latter. This is not true in general, 
but we may restrict attention to a case where it provably is true. In particular we may 
assume without loss of generality that ip{x) > lOB, since otherwise it is obvious that 
any placement will reduce ip{x) by at most 10-B. 

Lemma 3.14 Suppose F is a bounded waste distribution with B > 2, x is a profile 
with ipix) > 10-B, and x! is the profile resulting from using SS to place an item of size 
s E Up into a packing with profile x. Then ipix') > ip{x)/2. 
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Proof. By hypothesis, tl{x) +r£){x) > lOOi?^. We break into cases depending on the 
relative values of tl{x) and td^x). 

Suppose tl{x) > ro{x), in which case tl{x) > ijj{xy/2 > 50B^. If the new item 
goes into a dead-end level bin, then tl{x) remains unchanged and iIj{x') > \/ ip{xy /2 > 
.707tp{x) > iIj{x)/2. If on the other hand the new item goes into a bin with a live level, 
say h, then tl{x) will decline by at most 2xh — 1- 

We now break into two further subcases. If 2xh — 1 < tl{x)/2, then we will have 
TLix') > tl{x)/2 > ^{xf/A and so ^{x') > ^ip{xy/A = ^{x)/2. If 2xh-l> tl{x)/2, 
then Xh > tl{x)/A > 12.5-B^. But this means that 

tl{x) ~ xl - (12.552)2 - VSO 



Thus tl{x') > .9Qtl{x) > AStpixy and ^/j{x') > ^/ AStpixy > mtpix) > tlj{x)/2. Thus 
when ri{x) > rj:,{x) we have ipix') > ip{x)/2 in all cases. 

Now suppose that ti{x) < td^x), in which case r£){x) > iIj{x)'^/2 > 50B^. Consider 
the bin in which the new item is placed. If its new level is a live level, then so must have 
been its original level. Thus roix) is unchanged, and we have ipix') > ^yilj{x)'^/2 > 
.707^(x) > ^pix)/2. 

The only case remaining is when r£){x) > il){xY/2 > 50B^ and the new item 
increases the level of the bin that receives it to a dead-end level. Thus the count for 
one dead-end level increases by 1. Let us denote this level by h~^. If the item was 
placed in a bin with a live level, that is the only change in the dead-end level counts. 
Otherwise, an additional one of those counts (the one corresponding to the original 
level of the bin into which the item was placed) will decrease by 1. Let h~ denote this 
level if it exists. 

In the terms of Lemma |3.13| , let G be a minimum- arc graph that verifies rD{x) for 



X. Let G' equal G if h~ doesn't exist or if outdegreec{h~) < Xh~- Otherwise let G' 
be a graph obtained by deleting one of the out-arcs leaving h~ in G. In both cases, G 
will be applicable to x'. Order the arcs of G so that the deleted arc (if it exists) comes 
last, preceded by all the other arcs out of h~ (if they exist), preceded by the arcs into 

(if they exist), preceded by all remaining arcs, and let the arcs of G' occur in the 
same order as they do in G. Let us now see what happens when we apply G' to x', 
and how this differs from what happens when we apply G to x. 

Let 6{a) be the change in t^, due to the application of arc a when G is being applied 
to X, and let 6'{a) be the change when G' is applied to x'. By Lemma |3.13| and the 
definition of tq we have 



rD(x 



5^ 5(a) (3.17) 



a6G 



roix) > 5^(5'(a). (3.18) 
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Thus to complete the proof of Lemma |3.14| , it will suffice to show that 



^5'(a)>c$^5(a). (3.19) 

aGG' aeG 

for an appropriate constant c. 

Consider an arc a = in G and let nj(a) and nj{a) {n'-{a) and n'-{a)) be the 
corresponding level counts when a is applied during the course of applying G to a; {G' 
to x'). By Lemma |1.2| and the fact that since i and j are dead-end levels neither can 
be or i?, we have 5{a) = 2(nj(a) — nj{a)) — 2 and 6'{a) = 2{n'^{a) — n'j{a)) — 2. 

Let h be one of i,j. Observe that ii h ^ {h~^, h~}, then n'f^{a) = nh{a), if h = 
then n^(a) = nh{a) + 1, and ii h = h~ then n^(a) = nh{a) — 1. Thus the only arcs 
a = for which 6' (a) < S{a) are those with i = h~ , j = or both. If only one of 
the two holds, then 6'{a) = 6{a) —2. If both hold then S'{a) = 6{a) —4. As a notational 
convenience, let A* denote the set of deleted arcs. (Note that A* will either be empty 
or contain a single arc.) Then we have 

5'(a) > 6{a) - 2 {indegreedh^) + outdegreecih')) - ^ ^G{a)- (3.20) 

aeG' aeG aeA* 

where outdegreedh") is taken by convention to be if h~ does not exist. 

Let us deal with that last term first. If there is an arc a* = {i,j) in A* then by 
our ordering of arcs in G it is the last arc. Suppose Scia*) = 2(nj(a) — nj{a)) — 2 > 4. 
Then we nj(a) — nj{a) > 3. But this means that after the arc is applied we will have 
Np{i) — Np{j) > 2, and so it would be possible to apply an additional arc and 
this would further decrease td by at least 2. But this contradicts our choice of G as a 
graph whose application to x yielded the maximum possible decrease in r(x). So we 
can conclude that 

Scia*) < 4 < 2B. (3.21) 

Now let us consider the rest of the right hand side of ( p.20 ). Let M = indegreeG{h^) + 
outdegreecih-). If M < 105, then 

Y ^(a) - Yl ^'(") < 2M + 25 < 22B < Al^{xf 

aeG aeG' 



since by assumption ip{xY > 1005^ > 2005. Thus by ( |3.17|) , (|3.18|) , and our assump- 
tion that r£){x) > 'ip{x)^/2, 

r-Dix') > .39tp{xy 

and hence iplx') > .624?/'(x) > il){x)/2. 

Thus we may assume that M > 105. Let Ah denote the multiset of arcs in G with 
i = OT j = or both, and let us say that a pair < i,j > of dead-end levels is a 
valid pair if A^ contains at least one arc Note that there can be at most 5 — 1 
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valid pairs, since by Lemma p.l3| no vertex in G can have both positive indegree and 
positive outdegree. 

Suppose < i,j > is a vahd pair and there are m copies of arc (z, j) in Ah. By 
Lemma |3.13| each copy must decrease td when it is apphed, so if we let the last copy of 
(z, j) in our defined order be ai, the next-to-last by 02, etc., we will have Sci.O'k) > 2, 
1 < k < m. Moreover, since an application of an arc reduces Np{i) — Np{j) 
by at least 1, and since by Lemma p.l3| applications of other arcs cannot increase 
Np{i) or decrease Np{j), we must in fact have Sciak) > ^cicik-i) + 2, 2 < k < m. If 
{hi) = h^) then each apphcation reduces Np{i) — Np{j) by 2, and so in this case 
Scia-k) > SaicLk-i) + 4:, 1 < k < m. Thus 



^^^ciak) 



> 



^^i(2A;) = m(m +1), i = h or j = /i+ but not both 

YlT=ii^f^ - 2) = 2m2, i = h- and j = h+ 
Since 2m^ > m(m + 1) for all m > 1, we thus have 



k=l 



M 



B-1 



M 



B-1 



+ 1 > 



M2 



B- 1 



M 



(3.22) 



Then by (^), and our assumption that M > lOB, we have 



Ea6G '^(«) - Eaec '^'(«) < 2M + 2B_ 2 + 



< 



B_ 

M 



Ml. — 

B-1 

2.1(5 



M 



M 
B-1 



1 



(3.23) 



1) ^ 2.1(5-1) 



<M<.ni 
- 19 - 



M - 5 + 1 - 95+1 
Thus by ( p.l7| ) and ( |3.18| ) and our assumption that roix) > ip{x)/2, we have 

rnix') > .SSdrnix) > AUipixf 

And hence 'iIj{x') > a/tdW) > ■QQQ'ip{x) > ip{x)/2. Thus in all cases we have %Ij{x') > 
ijj{x)/2 and Lemma |3.14] is proved. ■ 

Returning to the proof that Hajek's Lemma applies, recall that we are in the midst 
of proving that the Bounded Variation Hypothesis holds, and are left with the task 
of showing that ip{x) — ip{x') < 105 in the case where ip{x') < ■ip{x). By Lemma p.6| 
it will suffice to show that ip{xy — il){x'Y ^ 205'?/^(x') when %p{x) > 105, which by 
Lemma p.l4| will follow if we can show that 

ip{xf-ij{x'f < 10Bij{x) 



(3.24) 



As in the proof of Lemma p.l4| , we divide the difference ■?/'( 



X? 



il){x' 



into two 



parts that we will treat separately: tl{x) — tl{x') and r£)(x) — rp){x'). 
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We begin by bounding the first part. If tlie item being packed goes in an empty bin, 
tlien a live level gets increased and no dead-end level is changed, so iIj{x) increases, 
contrary to hypothesis. If the item being packed goes into a bin with a dead-end 
level, then tl{x) remains unchanged. If the item goes into a bin with a live level 
h, then by ( |3.7| ) we have that Xh < 'ipix), so by Lemma |T]^ the decrease in is at 
most 2xh — 1 < 2iIj{x) < Btp{x). Thus to prove ( |3.24| ) it will suffice to prove that 
roix) — roix') < 9Bip{x). 

To bound this second difference, note first that the hypotheses of Lemma |3.14| hold. 
So as in the proof of that Lemma, let G be a graph that verifies roix). If the placement 
of the item changes no dead-end level counts, there is nothing to prove, so we again 
may assume that there is a dead-end level that increases by 1 and (possibly) a 
dead-end level h~ that decreases by 1. As in the proof of the Lemma we have 

Toix) — r]j{x') < 2(indegreeG{h^) + outdegreecih')) + 2B (3.25) 

where by convention outdegreedh') is taken to be if h~ doesn't exist. 

Also, as in the proof of Lemma p.l4| , there are at most B — 1 distinct pairs < i,j > 
such that is an arc of G and i = h~, j = , or both. But then by Lemma 

3.13| (iii) we have fewer than ip{x) copies of each. Given that arcs {h~,h^) will be 



double counted in indegreecih^) + outdegreecih'), we thus have 

indegreec{h'^) + outdegreec{h~) < Btp{x) 
Combining this with ( ^.25] ) we conclude that 

roix) - roix') < 2BiIj{x) + 2B < 9Btlj{x) 
We thus conclude ( ^.24[ ) holds and hence so does the Bounded Variation Hypothesis. 

To complete the proof that Hajek's Lemma applies, all that remains is to show that 
the Expected Decrease Hypothesis holds. Essentially the same proof that was used 



when there were no nontrivial dead-end levels will work, except that Lemma |3.7| needs 
to be modified to account for the possibility of such levels and we need to show that 
both it and Lemma |2.2| hold for ■?/'(x)^. 



This is straightforward for Lemma |2.2| , which essentially says that assuming F is 
a perfectly packable distribution, the expected increase in (f){x)^ that can result from 
using SS to pack an item generated according to F is less than 2. This will hold for 
iplxY as well since by definition 

V'(x)^ = tl{x) +rD{x) 

= Tl{x) + Tr){x) - To{x) 
= (pixf -To{x), 

and by definition tq{x) can never decrease. 

As to Lemma |3.7| , we need only modify it by increasing the two key constants in- 
volved. The precise values of these constants are not relevant to satisfying the Expected 
Decrease Hypothesis. In particular, we can prove the following variant on Lemma |3.7| . 



27 



Lemma 3.15 Let F be a bounded waste distribution and let P be any packing that can 
be created by applying SS to a list of items all of whose sizes are in Up- If x is the 
profile of P and ■ip{x) > 2\f2B'^l'^ , then there is a size s G Up such that if an item of 
size s is packed by SS into P, the resulting profile x' satisfies 
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Proof. Since tq{x) can never decrease, the result will follow if we can show that there 
exists an item size s such that if an item of size s is packed by SS^ ip{xy — tq{x) = 
tl{x) + td{x) = ss{P) will decline by at least ipix)/ B"^. 

Suppose tl{x) > ip{x)'^/2. Then as in the argument used in the proof of Lemma 
|3l7| there has to be a live level h with Xh > ■\/tl{x)/ B > il){x)/ {V2B) and hence a size 
s that will cause ss{P) to decline by at least 



Xh 



ip{x) ipix) + 2V253/2 ^(x) 



2 ^ -1] > 2 > ' ^ 2= L: ' > 



B ) - ^B'^l'^ ~ v/253/2 V253/2 - 52 • 

Suppose on the other hand that ti^{x) < iIj{x)'^/2. In this case we must have r£,{x) > 
ip{x)'^/2. Let G be a minimum-arc reduction graph that verifies r£){x) > '?/'(x)^/2, and 
suppose G contains m arcs, ordered as ai, 02, ctm- By Lemma |3.13| (i),(iii), we know 
that m < {B — 1)iIj{x). Thus by Lemma p.l3| (ii) we know that for some i, 1 < i < m, 



rpjx) ^{xf ipixf ^ ipjx) ^ ^(x) 
m 2m 2Bi:{x) 2B - B^ ' 

where recall that A[i] is defined to be the reduction in when the arc is applied to 
the intermediate profile y[i — 1], created by the application of earlier arcs in sequence 
to X. Suppose arc Oj = {h,j). Now by Lemma p.l3| (i), the fact that h is the source 
of arc means that it cannot have been a sink of a previous arc, so we must have 
y[i — l]h ^ Xh- Similarly the fact that j is the sink of arc means that it cannot be 
the source of any previous arc, so y[i — l]j > xj. But then the reduction in td that 
would be obtained if were apphed directly to x, i.e., if an item of size j — his placed 
in a bin of level /i, is by Lemma |L 



2{xh - X, - 1) > 2{y[i - l]h - y[i - I], - 1) = A[«] > "^^^^ 
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Thus, SS will place an item of size s = j — h in such a way as to reduce ss{P) by at 
least this much. ■ 

The remainder of the proof that Expected Decrease Hypothesis is satisfied by ip{x) 
proceeds just as the proof for 0(a:) did when there were no nontrivial dead-end levels. 



Thus Hajek's Lemma applies and the upper bound of Theorem 3.11 is proved. 
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3.3.2 Proof of the Q(\ogn) Lower Bound. 



We begin the proof with a sequence of lemmas. 



Lemma 3.16 Suppose s is a divisor of the bin size B. Then if an item of size s is 
placed into a packing P using SS, the value of ss{P) can increase by at most 1. 

Proof. If there is a bin of level B — s, then placing an item of size s into that bin would 
decrease ss{P). If there is no bin of level s, then starting a new bin with an item of 
size s will increase ss{P) by 1. Otherwise, let hs = max{/i : s\h and Np{h) > 0}, and 
note by assumption that h < B — 2s. By Lemma |1.2| , placing an item of size s in one 
of the bins with level hs increases ss{P) by at most 2{Np{hs + s) — Np{hs)) + 2 < 0. 
Thus in every case there is a way to increase ss{P) by 1 or less, and so SS must choose 
a move that increases ss{P) by at most 1. ■ 

Let us say that a level h is divisible for F if any set of items with sizes in Up that 
has total size h can contain only items whose sizes are divisors of B. 

Lemma 3.17 If h is a nontrivial dead-end level for F then h is not divisible for F. 

Proof. Let Ti. be the set of all levels i, 1 < i < -B — 1, that are divisible for F and 
assume, for the sake of contradiction, that h & Ti. Since /i is a nontrivial dead-end 
level for F, there is some list L that under SS yields a packing containing at least two 
bins with level h. Consider the first time during the packing of L that a level i & Ti 
had its count Np{i) increase from 1 to 2, and let s be the size of the item x whose 
placement caused this to happen. By definition of divisible level, s must be a divisor of 
B, and so by Lemma p.l6| , the placement of x can have increased ss{P) by at most L 
But this is impossible: If i = s then the insertion of x would have increased ss{P) by 
2^ — 1^ = 3. On the other hand, suppose i > s. Since i is a divisible level, so is i — s. 
Thus Np{i — s) = Np{i) = 1 just before x was packed: Neither count can exceed 1 by 
our choice of i, the latter must be 1 if it is to increase to 2 after the placement of x, 
and the former must be 1 since x can only create a bin with level i if there is a bin of 
level i — s into which it can be placed. However, this means that ss{P) increases by 2, 



contradicting Lemma 3.16. So h ^ H, as desired. 



Lemma 3.18 Suppose s is an item size that does not evenly divide the bin capacity B 
and we are asked to pack an arbitrarily long sequence of items of size s using SS. Let 
di = is, < i < [B / s\ . For all m > 0, the packing in existence just before the first 
time Np{di) > m must have Np{di) = mi for every d^. 

Proof. Let us say that mi is the target for level d^. We first show that it must be 
the case that Np{di) is no more than its target, 1 < i < [B/s\, so long as Np{di) has 
never yet exceeded its target. Suppose not, and consider the packing just before the 
first one of these counts, say Np{di), exceeded its target. In this packing we must have 
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Np{di) = mi. Let Ah = Np{dh) — Np{dh-i), I < h < [B/s\, where by convention 
Np{0) = and so Ai = Np{di). Since Ai by hypothesis is m or less, Lemma |l]2| imphes 
that Aj < Ai < m. But then we must have Np{di-i) > {i — l)m + 1, contradicting 
our assumption that level di was the first to have its count exceed its target. 

For the lower bound, note that in the packing just before Np{di) first exceeds m, it 
must be the case that Ai = Np{di) = m. Since this was the preferred move under SS, 
it must be the case by Lemma |l]2|that Aj > m, 2 < i < [-B/sJ. The result follows. ■ 

Lemma 3.19 Suppose F is a fixed discrete distribution with at least one nontrivial 
dead-end level h and H is a positive constant. Then there is a list Lh of length 0{H) 
consisting solely of items with sizes in Up, such that the packing resulting from using 
SS to pack Lh contains at least H bins with dead-end levels. 



Proof. By Lemma |3.17| there must be a set S* = {xq, Xi, . . . , Xt} of items with 
sizes in Up whose total size is h, and for which s{xq) is not a divisor of B. Let 
us also assume that all items of any given size appear contiguously in the sequence 
s{xo), . . . , s{xt). Note that we may assume that s(xj) > 2, < i < t, since if 1 

were in Up there could be no dead-end levels. Let hi = ■_n sixj ),0 < i < t. Note 
that ht = h. Further, let k = [B/s{xq)\ and di = i ■ s{xo), < i < k. 

Our list Lff will consist of a sequence of t + 1 (possibly empty) segments, the first of 
which (Segment 0) consists of H3^Yli=i'''^ ~ H3^k{k-\- l){2k-\- 1)/6 items of size s(xo). 
In the packing P obtained by using SS to pack these items, we will have by Lemma p.l8| 
that level i ■ Si will have count iH3^, 1 < i < k, and in particular level Hq = s{xo) will 
have level iJ3*. In what follows we use "P" generically to denote the current packing. 
Note that after Segment has been packed, P contains H3^Yli=i''' ~ H3*k{k + l)/2 
partially filled bins. 

Segment 1 consists of the shortest possible sequence of items of size s(xi) that, 
when added to P using SS, will cause the count for level hi = s(xo) + s(xi) to equal or 
exceed H3^~^. A sequence of this sort must exist for the following reasons: If Np{hi) 
is itself H3^^^ or greater, as for instance it would be if s{xi) = s{xq), then the empty 
segment will do. Otherwise, suppose Np{hi) < H3^^^ . So long as Npiho) > 2H3^^^ 
and Np{hi) < i73*^^, placing an item of size s{xi) in a bin with level /iq would cause 
a greater reduction in ss{P) than placing it in a bin of level hi could, and so would 
be the preferred move. Since we can place H3^~^ items in bins of level ho before 
Np{ho) < 2H3^~^, and each such placement would increase Np{hi) by 1, this means 
we will eventually have placed enough to increase Np{hi) to the desired target. Note 
that we will eventually be forced to place items in bins of level h^ rather than some 
level other than h^ or hi, since the existence of moves that decrease ss{P) means that 
no new bins are being created. 

We complete our argument by induction. In general, we start Segment j , 2 < j < t 
with a packing in which Np{hj_i) > H3^~^~^^ and no new bins have been created since 
Segment 0. The segment then consists of the shortest possible sequence of items of size 
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s{xj) that will cause the count for level hj — hj-i + s{xj) to equal or exceed HS*"^ . 
An argument analogous to that for Segment 1 says that this must eventually occur 
without any additional bins being started. Thus at the end of Phase t we have H bins 
with level ht = h. Given that all the s{xj) are 2 or greater, the total number of items 
included in Segments 1 through t, none of which started a new bin, is no more than 
BH3*k{k + l)/4 and so the total number of items in our overall hst Lh is at most 

H3%k + l){2k + l) ^ BH3%k + 1) ^ ^3gs^ ^ ^^^^ 

for fixed F, as required. ■ 

For future reference, note that since 1 cannot be in Up if F has dead-end levels, 
the number of segments in Lh is less than B/2. 

Lemma 3.20 Suppose P and Q are two packings for which 

B-l 



\P-Q\ = J2\Np{h)-NQ{h)\^M 



h=l 

and L is a list consisting entirely of items of the same size s > 2. Then the packings 
P' and Q' resulting from using SS to pack L into P and Q satisfy \P' — Q'\ < BM. 

Proof. We prove the lemma for the special case of M = 1. The general result then 
follows by repeated apphcations of this M — 1 case. So assume \P — Q\ — 1. 

Let g denote the level that has different counts under P and Q and suppose without 
loss of generality that Np{g) = NQ^g) + 1. Let Pi and Qi denote the packings that 
result after the first i items of L have been packed into P and Q respectively. We will 
say that a triple {i,j, i), <i,j < \L\ and < i < B, is a compatible triple if either 

1. Pi = Qj and £ e {0,B}, or 

2. \Pi — Qj\ — 1, and £ is the unique bin level such that 1 < £ < B — 1, Np.{£) — 
Note that by this definition (0, 0, 51) is a compatible triple. 

Claim 3.20.1 If{i,j,£) is a compatible triple withi,j < \L\ then one of the following 
three triples must also be compatible: 

{i + l,j + l,£), {i + l,j,£ + s), {i,j + l,£-s). 

Proof of Claim. Consider the packings Pi and Qj. Suppose SS would place an 
item of size s in bins with the same level in both Pj and Qj, as for instance it must 
if £ e {0,B} and hence the two packings have identical level counts. Then the same 
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bins counts would be changed in the same way for Pj and Qj and so (i + 1, j + 1, ^) is 
a compatible triple. 

Otherwise suppose SS would place an item of size s in bin hp for Pi and in hq 
for Qj, with hp ^ hq. In this case Pi and Qj must be different, and we are in case 
2 of compatibility. Let Aq(/i) (resp. Ap(/i)) denote the net reduction in the sum of 
squares if an item of size s is placed in a bin of level h in Qj (resp. Pj), assuming such 
a placement is legal. Since the bin counts Np^{h) and NQ^{h) arc equal for every h 
other than £, it follows that Ap(/i) = ^Q{h) for all /I's other than I and I — s. Since 
SS makes different choices for Pj and for Qj, it must be that at least one of hq, hp is 
either £ or £ — s. By hypothesis we have Np.{i) = Nq.{t) + 1 and all other counts are 
equal, so Ap(£ - s) < Aq(£ - s) (if £ - s > 0), Ap(^) > Aq(£) (if £ + s < P), and for 
all other values of h, Ap(/i) = Aq(/i). 

Thus if /ip = £ — s we must have /iq = ^ — s = hp, given that it is even more 
valuable to place an item of size s into a bin of level £ — s in Qj than in Pj. Similarly, 
a hq = i then we must have hp = i. Since by assumption hp ^ hq, this means that 
either hp — i or hq — £ — s. 

In the first case, hp = i, we must have i + s < B. Packing an item of size 
s into a bin with level £ in Pj reduces Np-{£) by 1, so that Np^^-^{£) = Nq^{£). If 
£ + s = B, i.e. we fill up a bin, then |Pi+i — Qj| = 0, and so {i + 1, j, ^ + s = B) is 
a compatible triple. \i £ + s < B then Np.[£ + s) will increase by 1 and we will have 
Npi+ii'^ + s) — Np.{i) + 1 — Nq. {£ -\- s) -\- 1, while all other levels now have the same 
counts. Thus {i + l,j,£ + s) is again a compatible triple. 

In the second case, hq = £ — s, we must have ^ — s > 0. Packing an item of size s 
into a bin with level £ — s in Qj increases Nq\£) by 1, so that Np^{£) = Nq.^-^{£). If 
£ — s = 0, i.e. we pack s into a new bin, then |Pj — Qj+i\ = 0, and so {i,j + l,£ — s = 0) 
is a compatible triple, li i — s > then Nq. {£ — s) will decrease by 1 and we will have 
^Qj+ii'^ ~ = ^Qji^ — s) — 1 = Np-{£ — s) — 1, while all other levels now have the 
same counts. Thus + 1,£ — s) is again a compatible triple. 

This completes the proof of the Claim. ■ 

Given the Claim and the fact that (0,0, (?) is a compatible triple, we have by in- 
duction that at least one of the three following scenarios must hold: 

1. (|L|, |L|, gi) is a compatible triple, or 

2. There is an integer a, 1 < a < {B — g)/s such that (|L|, \L\ — a,g + as) is a 
compatible triple, or 

3. There is an integer 6, 1 < 6 < g/s such that (|L| — b, \L\,g — bs) is a compatible 
triple. 

In the first case we have \P'—Q'\ = 1, which clearly satisfies the Lemma's conchision. 
In the second we have |P' — (5|L|-a| = 1) but to get Q' from Q\L\-a we will need to add 
a additional items of size s, and each addition will change one or two level counts by 
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1. Since s > 2 and g > 1, we must have a < {B — g)/s < {B — l)/2. Thus we can 
conclude that \P' — Q'\ < l + B — l = Bas desired. The third case follows analogously 
and the Lemma is proved. ■ 



Lemma 3.21 Suppose F is a fixed discrete distribution with at least one nontrivial 
dead-end level and X is a positive constant. Then for any D > X there is a list Lx,d 
of length 0{D) consisting solely of items with sizes in Up, such that for any packing 
P with no live-level count exceeding X, the packing Q resulting from using SS to add 
Lx,D iiT'to P contains at least D bins with dead-end levels. 

Proof. We may assume that P contains fewer than D bins with dead-end levels, 
because the number of bins with dead-end levels can never decrease and if we already 
had D such bins any list will do for Lx,d- Let h he a. nontrivial dead-end level for F. 



For our list we simply let Lx,d be the list Lh derived for h using Lemma |3.19| , with 
H = {XB + D)B^/^ + D = 0{D) for fixed F. By Lemma the length of Lh will 
hyOiH)=OiD). 



If Pq denotes the empty packing, we know by Lemma p.l9| that if SS is used to 



pack Lh into Pq it will create a packing Pq with at least H bins having the dead-end 
level h. Let P' denote the packing that would result if we used SS to add Lh to P. 
Note that |P - Pol = E^LV Np{i) < X{B - 1) + D - 1. Thus by applying Lemma 
3.2U| once for each segment of Lh and using the fact that Lh contains less than B/2 
segments, we have that |P' — Pq| < B^^'^{XB + D). But this means that for dead-end 
level h we must have Npi{h) > H — B^^'^{XB -\- D) = D and so P' contains at least 
the desired number of bins with dead-end levels. ■ 



Lemma 3.22 Let Pn be the packing after N items generated according to F have been 
packed by SS . There is a constant X, depending only on F , such that for any N > 

p[Np^{i) < X for all live levels i]>\ (3-26) 

Proof. Recall from the inequality ( p.8|) of the proof of the 0(log?7,) upper bound on 
the expected waste of SS that for any > 0, if Xjsi is the profile after packing N 
items, then there are a constants c and T, depending only on P, such that 

E [c'^^^^)] < T 

This meant that 

p [S^^) > 2T] < - 

and hence that 

p[^(X^)>log,(2T)]<i 
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Since as we have repeatedly observed tp{x) > Xh for every live level h, this in turn 
means that the probability is at least 1/2 that no live level count exceeds log^{2T). 
Thus the Lemma holds with X = log^(2T). ■ 

We are now in a position to prove our Q{\ogn) lower bound on EW^^{F) when 
F has nontrivial dead-end levels. We may assume without loss of generality that all 
the sizes si, . . . ,sj specified by F are in Up, i.e., that pj > 0, 1 < j < J. Consider 
the lists Lx,D specified by Lemma |3.21| for the value of X given by Lemma |3.22| , and 



let Id denote the length of Lx,d- Since the value of X depends only on F, Lemma 



3.21| implies that there is a constant c, depending only on F, such that for all D > X, 
in < cD. 

Now suppose we have a random list L of length cD of items generated according to 
F. The probability that Lx,d is a prefix of L is at least e^°, where e = min{pj : 1 < 
j < J}- Let a = log2(l/e). Then the probability that Lx,d is not a prefix of L is at 
most (1 - (1/2)"^^). 

Now consider a random list L* of length cD2'^'^^, viewed as a sequence of 2°-'^^ 
random segments of length cD. The probability that none of these segments has Lx,d 
as a prefix is 

^ 1 1 

^ e ^ 2' 

In other words, the probability that at least one of these segments has Lx,d as a 
prefix exceeds 1/2. Consider the /ast segment that has Lx,d as a prefix (should any such 
segments exist), and the packing P that exists just before this copy of Lx,d is packed. 
Note that by choosing the last such segment, we do not condition in any way the list 
that precedes this copy or the packing P. Hence by Lemma p.22| , with probability at 



least 1/2 the packing P has no live level count exceeding X, and by Lemma 3.21 , after 
the segment is added to the packing, the new packing (and all subsequent ones) will 
contain at least D bins with dead-end levels. Thus the expected number of bins with 
dead-end levels after all of L* is packed is at least (1/2)(1/2)D = D/4 = Q{log \L*\). 
The lower bound follows. ■ 



4 SS and Linear Waste Distributions 

The implication of Theorem |2.4| that ER^{F) = 1 for all perfectly packable distribu- 



tions F unfortunately does not carry over to the case where EW^^^ = G(n). 
Theorem 4.1 There exist distributions Fk, 1 < k < oo, such that 

hmsup ER^{Fk) = 1.5 . 

fc— >oo 
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Proof. Let be the distribution in which the bin size is -B = 2A; + 1 and the single 
item size 2 occurs with probabihty 1. Consider an n-item hst L„ generated according 
to Fk where n is divisible both by k and by X]f=i ~ ^(^ + l){2k + l)/6. Then 
OPT{Ln) = n/k and by Lemma p.l8| , we have 




k{k+l) 



3n 



/ fc(fc+i)(2fc+i) \ 2A; + 1 



Thus ER^{Fk), which is defined as a limsup, equals 3k/ {2k + 1) and the Theorem 
follows. ■ 

We conjecture that 3/2 is the worst possible value for ER^{F) over all discrete 
distributions F, although at present the best upper bound we can prove is 3, which is 
implied by the following worst-case result. 

Theorem 4.2 For all lists L, SS{L) < 3\s{L)/B] < 30PT{L). 

Proof. Let x be the last item of size less than 5/3 that starts a new bin and let s be 
the size of x. (If no such x exists, then all bins are at least 5/3 full in the final packing 
and we are done.) Let P be the packing just before x was packed. It is sufficient to 
show that the average bin content in the bins of P is at least B/3. If that is so, then 
the packing of subsequent items cannot reduce the average bin content in the bins not 
containing x to less than B/3. Consequently if m is the final number of bins in the 
packing, we must have s{L)/B > (m — l)/3 and hence OPT{L) > \s{L)/B'\ > m/3 
and the theorem follows. 

So let us show that that the average bin content in the bins of P is at least B/3. 
For 1 < j < s, let ij as the greatest integer such that j + ijS < B and let Qj denote the 
set of bins with contents j, j + s, . . . , j + ijS. Note that fii, . . . , is a partition of the 
bins of P into s sets, and if we can show that the average contents of the bins in each 
nonempty Qj is at least B/3, we will be done. Fix j and suppose k is the least integer 
such that either Np{j + ks) > or j + ks > B/3. If j + ks > B /3 then every bin in Qj 
has contents at least B/3 and so Qj behaves as desired. So suppose j + ks < B /3, in 
which case we must have k < ij. Since SS places x in a new bin, we must by Lemma 



1^ have 

< Np{s) < Np{j + hs + s)- Np{j + hs), h = k,...,ij-l 

and hence Np{j + ijs) > ■ ■ ■ > Np{j + ks). This means that if we let t = j + ks the 
average contents of the bins in Qj is at least 

t+{t + s) + --- + {t + {£j-k)s) _2t+{£j-k)s j + ijS ^ B - s B 

ii-k+l ~ 2 ^ 2 - 2 ^3"' " 
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5 Identifying Perfectly Packable Distributions 



Given the observations of the previous section, it would be valuable to be able to 
identify those distributions F that satisfy the hypotheses of Theorem 2A, i.e., those 
for which EW^^^{F) = 0{^/n) and hence ER^{F) = 1 is guaranteed. This task is 
unfortunately NP-complete, as it would require us to solve the PARTITION problem 
||GJ79|| . Fortunately, however, the problem is not NP-complete in the strong sense, 
and as we shall now see, can be solved in time pseudo-polynomial in B via linear 
programming, as was claimed but not proved in | |(JJK"^99| . 

Suppose our discrete distribution is as described above, with a bin capacity B, 
integer item sizes si, S2, ■ ■ ■ , sj, and rational probabilities pi,p2, . . . ,pj. We may assume 
without loss of generality that all these probabilities are positive. Our linear program, 
which for future reference we shall call the "Waste LP for F," will have JB variables 
v{j, h), 1 < j < J and < h < B — 1, where v{j, h) represents the rate at which items 
of size Sj go into bins whose current level is h. The constraints are: 



v{j, h)>0, l<j<J, 0<h< B-l (5.1) 
v{j, h) =0, 1 < J < J, Sj> B-h (5.2) 



B-l 



h=0 

J J 



J2v{3,h)=p,, l<j<J (5.3) 

h=0 
J 

J2v{j,h) < J2^{j,h- s,), l<h<B-l (5.4) 
j=i j=i 

where by definition the value of v{j, h — sj) when h — sj < is taken to be for all 
j. Constraints ( |5.2| ) say that no item can go into a bin that is too full to have room 
for it. Constraints (^]3|) say that all items must be packed. Constraints ( |5.4] ) say that 
bins with a given level are created at least as fast as they disappear. The goal is to 
minimize 

c(F) ^ ( - ^) ■ (E^U^ - - ) ) (5-5) 

h=l \ \j=l j=l ) J 

Note for future reference that by definition we must have c(F) < B — \. 

In what follows, ciF) will always denote the optimal solution value for the Waste 
LP for F, and ES{F) will denote the expected item size under F, i.e., J2i=iPj^j- 

Lemma 5.1 Suppose F is a discrete distribution and let L„(F) be a random n-item 
list generated according to F . 



1. For all n > 0, 



EWr' (F) 



B 



< O(v^). 
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2. There exist constants b and N* such that for all n > N* 



OPT{L^{F))-^{eS{F) + c{F)) 



> 



< 



n 



1/6 • 



This lemma, which we shall prove shortly, implies the following three results. 
Theorem 5.2 Suppose F is a discrete distribution. Then 

c{F) 



hmsup I — 

n— ►oo \ IT' 



B 



Theorem 5.3 Suppose F is a discrete distribution. Then EW^^^{F) 
and only if c{F) = 0. 



O(v^) if 



Lemma 5.4 Suppose F is a discrete distribution and A is a (possibly randomized) bin 
packing algorithm for which E[A{L)]/OPT{L) < b for some fixed constant b and all 
lists L. Then 



ERtiF) 



ESjF) + B ■ limsup„^^Eiy„^(F)/n 
ESiF) + c{F) 



Theorems p.3| and |5.2| are immediate consequences of claim (1) of Lemma |5?1 . 
Lemma p.4| follows from claim (2). Basically, it says that ER^{F), which is defined in 
terms of expected ratios, can actually be computed in terms of ratios of expectations. 
It follows because (2) implies that we can divide the set of lists L of length n generable 
according to F into two sets. For the first set, which has cumulative probability 
1 — we have 



E 



A{L) 



OPT{L)) 



nES{F) + B ■ EW^{F) 
nES{F) + nc{F) 



1 + 



1 



n 



1/3 



(5.6) 



For the second set, which has cumulative probability E[A{L) / O FT [L]] < b. 

Thus this set contributes at most b/n}/^ to the overall expected ratio for Ln{F), mean- 
ing that ( ^.6|) holds with L replaced by Ln{F) and replaced by Lemma 
^ follows. We now turn to the proof of Lemma ETT. 



Proof. Consider the values f (j, h) of the variables in an optimal basic solution to 
the LP. Since all the coefficients and right-hand sides of the LP are rational, all these 
variable values must be rational as well, and there exists a positive integer N such that 
Nv{j, h) is an integer, 1 < j < J and < h < B. For each positive integer k, let Lk 
be a list consisting of kJ2h=o-^'^Uy^) items of size s^, 1 < j < J. By ( p.3| ) will 



contain kNpj items of size Sj for each j, for a total of kN items. We will thus have 
s{Lk) = kN-ES{F). 
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Note that we can construct a packing of Lk simply by following the instructions 
provided by the variable values in the solution to the LP. That is, for each j, start 
Nv{j, 0) bins by placing an item of size sj into an empty bin. By ( |5.4| ), the number of 
bins of level 1 will now be at least X]j=i ^'^{j-> Thus we can take a set consisting 
of Nv{j, 1) items of size Sj, 1 < j < J, and place each of these items in a distinct bin 
with level 1. We can now proceed to pack bins of level 2, and so on. Let Pk denote the 
resulting packing. 

How many bins does this packing contain? A bin in Pk that has level h contains 
items of total size h by definition, and in addition has a gap of size B — h. Thus the 
total number of bins is simply the 1/ B times the sum of the item sizes and the sum of 
the gap sizes, that is 



B 

and hence 



i {kN . ES{F) + E " ■ ^ " ~ ^ ) ) 



f) {ESiF)^ciF)). 



(5.7) 



Now, since is in essence the "expected value" of the random list L^NiF), we can 
use the packings Pk as models for packing the random lists Ln{F), n > 0. We proceed 
as follows: Given n, find that A; > such that kN < n < {k + 1)N. Now note that the 
packing Pk has kNpj "slots" for items of size sj, 1 < j < J, and Ln{F) is expected 
to have between kNpj and {k + l)Npj such items. Place as many items of L„ into 
the appropriate slots as possible, and then place the leftover items in additional bins, 
one per bin. The total number of bins used will then be \Pk\ plus the number X„ of 
leftover items, which implies that 

OPT{Lr,{F)) < (^E5(F) + c(F)) +X„. (5.8) 

Let rij denote the number of items of size Sj among the first kN items of Ln{F) 
and define 

A+ = max{0, rij - kNpj}, I <] < J 
Aj = max{0, kNpj — nj}, 1 < j < J 

Thus is the oversupply of items of size s,- among the first kN items and A7 is 
the shortfall. The number of leftover items among the first kN items of L„ is hence 
"^j^i A^ = A~, and so X„ < + "^j^i A^. Since each rij is a sum of indepen- 
dent Bernoulli variables when considered by itself, we have -E[Aj] < ^JkNpj{l — pj) < 
^JkNpj. Given that X]/=i \/ kNpj is maximized when all the probabilities are equal. 
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we have that ^[Ej=i^i] < J\/kN/J < ^/nJ and so E[Xn] < N + \fnJ = 0{^) 
since N and J are constants. 



Since X„ is a nonnegative random variable, we thus can conclude from ( |5.8| ) that 
Claim (2) of the lemma holds when the quantity inside the absolute value signs is 
positive. Since £^[s(L„(F))] = nES{F) we can also conclude that 



EW^^^iF) = E 



<^ + 0{M (5.9) 



and so (1) also holds when the quantity inside the absolute value signs is positive. 

To prove that (1) and (2) hold when the quantities inside the absolute value signs 
are negative, first observe that the packing defined above for L^. must be an optimal 
packing for L^. If not, i.e., if OPT{Lk) < {kN/B){ES{F) + c(F)), then we could 
use an optimal packing for L^. to define a better solution to our LP, contradicting our 
assumption that c{F) was the optimal solution value for the LP. 

Next observe that if we are given a packing P for L„,{F), we can construct a closely 
related one for (as defined above, with k = \n/N\), by a process of addition. For 
each of the at most Ylj=i ^7 ~ Si=i items in that do not have counterparts 
of the same size in L„, we add a new bin to P containing just that item. This new 
packing contains at least as many items of each size as does Lk and so must contain at 
least OPT{Lk) bins. Thus by ( |5.71 ) we must have 



OPTiUF)) + 5^ a; > OPT(L,) = [ES{F) + c{F) 

.7 = 1 ^ ^ 



(5.10) 



Claims (1) and (2) then follow by the same analysis of E[Y2j^^Aj] as was used 
when the quantity inside the absolute value signs was positive. ■ 

Thus one can determine whether EW^^^{F) is sublinear and, if it is not, compute 
the constant of proportionality on the expected linear waste, all in the time it takes to 
construct and solve the Waste LP for F. The worst-case time for this process obeys 
the following time bound. 

Theorem 5.5 Given a description of a discrete distribution F in which all probabilities 
are presented as rational numbers with a common denominator D > B, the Waste LP 
for F can be constructed and solved in time 

O (( Jfi)^-^ log' D)=0 log' D) . 

Proof. Given its straightforward description, the LP can clearly be constructed in 
time proportional to its size, so construction time will be dominated by the time to 
solve the LP. For that, the best algorithm currently available is that of Vaidya ||Vai89 



which runs in time 0{{M + NY'^ N L"^) , where M is the larger of the number of variables 
and the number of constraints (the latter including the "> 0" constraints), and is 
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the smaller, and L is a measure of the number of bits needed in the computation if all 
operations are to be performed in exact arithmetic. 

Our LP has JB variables and the number of constraints is Q{JB). Thus for our 
LP the runniner time is 0{{JBf-^L'^) = 0{B^L^). To obtain a bound on L, note that 
all coefficients in the constraints of the LP are 1, 0, or —1 and the coefficients in the 
objective function are all 0{B). The leaves the probabilities pj to worry about. Note 
that we can determine c{F) by solving the LP with each pj replaced by its numerator 
(the integer Dpj), and then dividing the answer hj D. If we proceed in this way, then 
all the "probabilities" are integers bounded by D. Following the precise definition of 
L given in ||Vai89|| we can then conclude that L = 0{JB\ogD), giving us the overall 
running time bound claimed. ■ 

Although this running time bound is pseudopolynomial in B, it will be polynomial 
if B is polynomially bounded in terms of J, which is true for many of the distributions 
of interest in practice. Moreover, much better running times are obtainable in practice 
by using commercial primal simplex codes rather than interior point techniques to 
solve the LP's. See ||ABD^|| which details simplex-based methods that can be used to 



compute c{F) in reasonable time for discrete distributions with J and B as large as 
1,000 and 10,000, respectively. 

In the remainder of this section, we will show how we can further distinguish be- 
tween the cases in which EW^^'^{F) = Q{y/n) and those in which EW^^'^{F) = 0(1). 
Our goal is to distinguish cases (a) and (b) in the Courcoubetis- Weber theorem, as de- 
scribed in Section 2. Thus we need to determine, given that pp is in Ap, whether it is 
also in the interior of A^. Our approach is based on solving J additional, related LP's. 
The total running time will simply be J + 1 times that for solving the original LP, and 
so we will be able to determine whether EW^^^{F) = 0{^/n) and if so, which of the 
two cases hold, in total time 0{J^-^B^-^ \og^ D) = 0{B^^\og^D). 

For each i, 1 < z < J, let > be a new variable and let LPj denote the linear 
program obtained from the Waste LP for F by (1) changing the inequalities in (|5.4|) 
to equalities, (2) replacing ( ^.3| ) by 



B-1 

^v{i,h) = pi + Xi 

h=0 

B-1 

Y,<J^h) = p„ l<jy^l<J, (5.11) 
h=0 

and (3) changing the optimization criterion to "maximize Xi" Let Ci{F) denote the 
optimal objective function value for LPj. Note that LPj is feasible for Xj = whenever 
c{F) = 0, so that Ci{F) is always well-defined and non-negative in this case. 

Theorem 5.6 If F is a discrete distribution, then EW^^^{F) = 0(1) if and only if 
c{F) = and Ci{F) >0,l<i<J. 
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Proof. Combining the Courcoubetis- Weber Theorem with Theorem p.3| we know that 
for all discrete distributions F, 

Pf e Ap if and only if c(F) = 0. (5.12) 

Let q{i,P) denote the vector obtained from pp by setting qi = Pi + P and qj = 
PjA j ^ i J- By (|5.12|) and the construction of the linear programs LPj, it is 
easy to see that q{i,P) is in Ap if and only if LPj is feasible when Xi = [3. Thus by 
convexity, q{i,(3) is in Ap if and only if < /5 < Cj(F). 

Let us first suppose that the stated properties of c(F) and the Cj(F)'s do not hold. 
If c{F) 7^ 0, then pp is not even in Aj, much less in its interior. So suppose c{F) = 
but Ci{F) = for some i, 1 < i < J. Then for any e > there is a vector q with 
\q—pF\ < e that is not in A^?, namely q{i,e). Thus by definition Pi? is not in the interior 
of Ap. 

On the other hand, suppose c(F) = and Cj(F) > 0, 1 < i < J. To show that pp 
is in the interior of A^, we make use of two elementary properties of such cones: 

CI. If the vector a = (ai, . . . , a^) is in a cone A, then so is the vector ra = (rai, . . . , ra^) 
for any r > 0. 

C2. If vectors a = {ai, . . . , aa) and b = (bi, . . . ^b^) are in A, then so is the vector sum 
a + b= (ai + 6i, . . . ,ad + bd). 

In other words, any positive linear combination of elements of the cone is itself 
in the cone. Our proof works by showing that there is an e such that any q with 
\pF — ^1 < e can be constructed out of a positive linear combination of vectors q{i,Pi) 
with < Pi < Ci{F), 1 < i < J. We begin by defining a set of key quantities. 



Cmin = min{ci(F) : I < i < J} 
Pmax = max{pi ■.l<i<J} 
Pmin = mm{pi > -.1 <i < J} 

r . I 1 Cmin 1 

= mm < -, > 

2 A J Pmax J 
I Pmin I Cmin \ I Pmin \ 

Note that by hypothesis Cmin > and since F is a probability distribution there 
must be some positive pi's and so pmin > 0. Hence 6 and e are also positive. Suppose 
q = (gi, . . . , qj) is any vector with \pp — q\ < e. We will show that q can be constructed 
out of a positive linear combination of vectors q{i,Pi) as specified above. 
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Let ej = gi — (1 — S)pi, 1 < i < J. We first observe that all the are positive. This 
is clearly true for all i such that qi > pi. Suppose qi < pi. In that case pi cannot be 0, 
so we must have pi > Pmin- If 5 = 1/2 we have 

I r ^ r Pmin Pmin Pmin ^ ri /r i 

(^i = qi-Pi + opi >dpi-e> — — = > 0. (5.13) 

If on the other hand 6 = Cmin/{4:Jpmax), then 



^ r- ^ / Cmin \ I C-min \ I Pmin \ ( (^min \ I Pmin \ „ / ^ i ^ \ 

ei>6pi-e>( — Pmin - = > 0- (5-14) 



4:Jpmax J ^ 8J / \Pmax J ^ 8^ ' \P'. 

We next observe that for each i, 1 < i < J, 

. , r ^ CfYiin , (^min ^^min / r- r\ 

ei < e + 5pi < — — + — pmax < -Tpr- (5.15) 

oJ ^'JPmax 

Now consider the vectors q{i,l3i), where Pi = Jei/{1 — S), 1 < i < J. By (|5.13|) 
through ( [5.15| ) and the definition of 6, we have 



< A = ^ < 2 J (^) = c_ 
and so all these vectors are in Ai?. Now consider the vector 

1-6 ' 



r = (ri,...,rj) = — ^^g(«,A)- 

i=l 

Since f is a positive linear combination of vectors in Ap, it is itself in by (CI) 
and (C2). But now note that for 1 < i < J, we have 

Thus q = f and the latter is in Ap, as claimed. This implies that pp is in the 
interior of A and the theorem is proved. ■ 



6 Handling Non-Perfectly Packable Distributions 

In this section we consider the case when EW^^'^{F) = G(n). As we saw in Section 4, 
we can have ER^{F) > 1 for such F. Fortunately, for each such F one can design a 
distribution-specific variant on SS that performs much better. For notational simplicity 
in what follows, we shall assume without loss of generality that the size vector s for F 
has Si = 1. {If 1 ^ Up then we simply set pi = 0.) Note also that we must have B > 1. 
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Theorem 6.1 For any discrete distribution F with EW^^'^{F) = Q{n), there exists 
a randomized variant SSp of SS such that EWf^^^{F) = EW^^^{F) + 0{^) and 
hence ER^'' (F) = 1 by Lemmas \5.1\ and \5.4\ This algorithm has expected running 



time 0{nB) and can itself be constructed in time polynomial in B and the size of the 
description of F. 

Proof. Algorithm SSp is based on the solution to the Waste LP for F, and in par- 



ticular on the optimal solution value c(F), which by Theorem |5]^ can be computed 
in time polynomial in B and the size of the description of F. The algorithm works 
by performing a series of steps, with new steps being taken so long as an item in L 
remains to be packed. At each step we flip a biased coin and according to the outcome 
proceed as follows. 

1. With probability 1 / (1 + c(F)) we take the next item from L and pack it according 
to SS. 

2. With probability c{F)/{l + c{F)) we generate a new "imaginary" item of size 1 
and pack it according to SS. 

Let Gn denote the total size of the gaps in the packing of Ln{F) by this algorithm, 
and let J„ denote the total size of the imaginary items in the packing. Then 

EW^'-{F) = ^[-^"] +/[^-] (6.1) 
B 

It is straightforward to determine E[In\. Divide the packing process into n phases, 
each phase ending on a step in which a real rather than imaginary item is packed. The 
expected number of imaginary items packed in each phase is 



oo 

E 

i=l 



1 + c(F) 



We thus can conclude the expected total number of imaginary items is nc{F), and 
since each is of size 1 we have E[In] = nc{F). 

Let us now turn to E[Gn]. Note that if we consider both real and imaginary 
items, we are essentially packing a list generated by the distribution F^ that has 
p+ = + c(F))/(l + c(F)) and pf = Pi/{1 + c(F)) for aU i > 1. 

Claim 6.1.1 EW^^^{F+) = 0{,/E). 



Proof of Claim. By Theorem |5.3| all we need show is that the solution to the Waste 
LP for F+ has c(F+) = 0. Denote this LP by LPf+ and denote the Waste LP for F 
by LPp. Let vo{j,h) be the variable values in an optimal solution for LPp, and for 
1< h< B -1 define 



Aft = ^MJ^h- Sj) -^Vo{j, h). 
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Define a new assignment v by 



.(1,/.) = 



1 + c(F) 
for < /i < 5 - 1. 

We claim tliat v satisfies tlie constraints of LPp+ and acliieves for tlie objective 
function, this implying that c(F+) =0. It is easy to see that v satisfies constraints 
(5.1), (5.2), and (5.3) for j 7^ 1. For j = 1, we have 



h=o J- ^ J V h=l h'=l 



1 + c(F) 



B-l 

p, + J2iB-h')A, 

h'=l 



+ c(F) 

as required. As for the constraints (5.4), we have for each h, 1 < h < B — 1, that 

^ / h-l h 

j j ^ ^ \ h'=l h'=l 



0. 



Thus f is a feasible solution for LPp+. Finally, the value of the objective function is 

B-l 



0. 



h=l \ j 



Thus is a perfectly packable distribution and by Lemma the expected in- 
crease in ss{P) during each step of algorithm SSp is less than 2, no matter what the 
current packing looks like. For alH > the expected increase during step i is thus less 
than 2 times the probability SSp takes i or more steps. Since the expected number of 
steps by the above argument about is n(l + c(F)), the expected value of ss{P) 

when the algorithm terminates is thus no more than 2n(l + c{F)). By Lemma p.3| 
this implies that E[Gn] < B ^ {B - l)n{l + c{F)) = 0{B^y/^) since c(F) < 5 - 1 by 
definition. Thus by (|6.1| ) we have 

B 



AA 



which by Lemma [5.1| is EW^^'^{F) + 0{^/n), as desired. 



All that remains is to show that algorithm SSp can be implemented to run in time 
0{nB). This is not immediate, since there are distributions F for which c{F) is as 
large as \B/2] — 1. Thus the total number of items packed (including imaginary ones) 
can be Q{nB), and the standard implementation of SS will take QinB"^). We avoid 
this problem by using a more sophisticated implementation, that adds an additional 
data structure to aid with the packing of the imaginary items. 

This data structure is a doubly-linked list of doubly-linked lists Dd- If P is the 
current packing, define 5h = Np{h + 1) - Np{h), < h < B-1, with Np{0) and Np{B) 
taken by convention to be 1/2 and —1/2 respectively. Then we know by Lemma |1.2| 
and the discussion that follows it that placing an item of size 1 into a bin of level h will 
yield a smaller increase (or bigger decrease) in ss{P) than placing it in a bin of level h' 
if and only if 6h < At any given time in the packing process, there is a sublist Dd 
for each value d taken on by some 6h, with that sublist containing representatives for 
all those h such that 6h = d and annotated by the value of d. The sublists are ordered 
in the main list by increasing value of d. For each value ofh, 0<h<B — 1, there is 
a pointer to the list for Sh and to the representative for h in that list. 

Given this data structure, we can pack an item of size 1 in constant time: find 
the first h in the first list Dd and place the item into a bin of level h. Note that 
this choice of h may violate the official tie-breaking rule for SS which requires that in 
case of ties, we should choose the largest h with 6h = di. However, as observed when 
we originally specified the official tie-breaking rules, none of the performance bounds 
proved in this paper depend on the precise tie-breaking rule used. Thus, we will still 
have ER^''{F) = 1 if SSp is implemented this way. 

To complete the proof that this implementation takes 0{nB) time overall, we must 
show how to keep the data structure current with a constant amount of effort per item 
packed. Here we exploit the fact that in packing a single item, only two counts get 
changed, and no count changes by more than 1. Thus at most four 6hS will change, 
and no 6h can change by more than 2. Thus all we need show is that if Sh changes 
by 2 or less, only a constant amount of work is required to update the data structure. 
But this follows from the fact if h is in Dd, then its new sublist can be at most two 
sublists away in the overall doubly- linked list, either in an already-existing sublist to 
which h can be prepended, or in a new sublist containing only h that can be created 
in constant time. ■ 

An obvious drawback of the algorithms SSp is that we must know the distribution 
F in advance. Fortunately, we can adapt the approach taken in these algorithms to 
obtain a distribution-independent algorithm, simply by learning the distribution as we 
go along. If we engineer this properly, we can get a randomized algorithm that matches 
the best expected behavior we have seen in all situations: 

Theorem 6.2 There is a randomized online algorithm SS* that for any discrete dis- 
tribution F with bin capacity B has the following properties: 
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(a) SS* runs in time 0{nB). 

(b) EW^^'iF) = EW^^'^'iF) + O(v^) 

(c) ERi''{F) = l. 

(d) IfEW^P^iF) = e(v^), then EW^^*{F) = Q{^/E). 

(e) IfEW^^^iF) =0(1), then EW^^\F) =0(1). 

Proof. Note that (d) will follow immediately from (b) and that (c) will follow from 
(b) via Lemmas |5]l| and ^.41 . Thus we only need prove (a), (b), and (e), which we will 



do in that order. 

As the basic building blocks of SS* , we will use a class of algorithms SS}), < r < 1 
and D d {1, 2, . . . , i? — 1}, that capture the essence of the algorithms SSp of Theorem 



6711 , modified slightly so that we can guarantee (e) above. Recall from Section |3.2| the 
algorithm 5*5" that guaranteed EW^^ (F) = 0(1) for all bounded waste distributions. 
This algorithm made use of a parameterized packing rule SSd, which packed so as 
to minimize ss{P) subject to the constraint that no bin with a level in D should be 
created unless this is unavoidable, in which case we start a new bin. Algorithm SS' 
maintained a set U of all the item sizes seen so far, and used SSd{u) to pack items, 
where D{U) is the set of dead-end levels for ?7, and SS* will do likewise. 

Algorithm SS}) works in steps, where in each step we fiip a biased coin and proceed 
as follows: 

1. With probability 1 — r we take the next item from L and pack it according to 
packing rule SSd. 

2. With probability r we generate a new "imaginary" item of size 1 and pack it 
according to SSd. 

Note that if r = c(F)/(l + c(F)), this is the same as SSp except for the modified 
packing rule. 

In algorithm SS* we maintain an auxiliary data structure of counts Xi, 1 < i < 
B — 1, where is the number of items of size i so far encountered in the list. From 
this we can derive the set U of the item sizes actually seen so far, as well as the current 
empirical distribution F', whose probability vector p is {Xi/N, X2/N, . . . , Xb-i/N) , 
where is the number of items seen so far. The packing process consists of a sequence 
of phases, during each of which we apply the packing rule SS})^jj-^, where U is the set 
of item sizes seen up to and including the first item to be packed in the phase and 
r = c{F')/{l + c{F')) for the empirical distribution F' at the beginning of the phase. 

We start with a 0-phase. An i-phase terminates when either (a) we see a new item 
size and have to update U and recompute D{U) or (b) we have packed a prespecified 
number of real items during the phase, where the number is lOB for a 0-phase and 
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3054*"^ for an i-phase, i > 0. If an i-phase is terminated by the arrival of an item with 
a previously unseen size, the next phase is once again a 0-phase. Otherwise, it is an 
(z + l)-phase. If the new phase has a different value for U or r, we begin it by closing 
all open bins. (A partially filled bin is considered open until it is closed. A closed bin 
can receive no further items and does not contribute to the count for its level.) We 
shall refer to phases that occur before all item sizes have been seen as false phases, and 
ones that occur after as true phases. Note that once the true phases begin, each phase 
(except possibly the last) packs 3 times as many items as the total number of items 
packed in all previous true phases. 

Note that this algorithm will have the claimed running time. The list-of-lists data 
structure developed to enable the algorithms SSp to run in time 0{nB) can be adapted 
to handle the SS}^ packing rules, so the cumulative time spent running SS}) for the 
various values of D and r is 0{nB). In SS* we have the added cost of re-initializing 
this data structure from time to time when we close all open bins, which can take Q{B) 
time, but this can happen no more than Jlog4(?T,/10-B) times. Thus the overall time for 
reinitialization is 0{B^ log Blogn) = o{nB) for fixed B. The only other computation 
time we need to worry about is that needed to solve the LP's used to compute the 



values of c{F'). By Theorem |5.5| , the time for the LP computed at the beginning of 
an z-phase is 0(i?^log^-D) where D < n. Since there are no more than Jlog4(n/10-B) 
phases, the total time spent in solving the LP's is thus 0(i?^°log^n) and for fixed B 
is again asymptotically dominated by the time to pack the items. 

The proof that SS* satisfies (b) will proceed via a series of lemmas. In what follows, 
if p and p' are two length- J vectors, we will use \\p — p'\\ to denote the distance 
between them, that is, 

j 

\\p-p\\ = \pi~p^\- 

Lemma 6.3 Suppose F and F' are two distributions over the same set {si, . . . , sj} of 
item sizes with probability vectors p and p' . Then 

\c{F)- c{F')\<B\p-pl\. (6.2) 

Proof. We show how to convert an optimal solution to the LP for F to a solution to 
the LP for F' for which the objective function c satisfies 

c<c[F)^B\p-p!\. (6.3) 

A symmetric argument holds for the situation where the roles of F and F' are inter- 
changed, and so (|6l^ ) will follow. 

For the purposes of this proof, where {si, . . . , sj\ and B are fixed, we can view our 
LP's as determined simply by the probability vectors for the distributions, p and p', 
and write c(p) and c(p') for c(F) and c(F') respectively. We will convert an optimal 
solution to the LP for p to a feasible one for p' via a series of steps. 
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For < j < J, let = {p{, . . . ,pj) be the vector with pi = p'^, I < i < j and 
Pi = Pi, j + 1 < i < J ■ Note that = p and p'^ = p'. Let LPj denote the LP 
for p^ . Note that these are legitimate LP's even though the intermediate vectors p^ , 
< j < J, may not have Yli=iPi ~ ^ hence need not correspond to probability 
distributions. We will show how to convert an optimal solution to LPj_i to a feasible 
one for LPj, 1 < j < J, for which the objective function c satisfies 

c < c{f'^) + B\p^ - p'^\. (6.4) 

Inequality ( |6.3|) will then follow by induction. 

So consider a feasible solution to LPj_i. Note that the only constraint of LP^ 
that is violated is the constraint of type ( ^.3|) for j, i.e., the constraint that says that 
^fjo^f (j, h) = p'y If p'j > pj, our task is simple. We simply add p'j — pj to v{j, 0) 
and leave all other variables unchanged. This will now satisfy the above constraint 
for j while not causing any of the others to be violated. The increase in the objective 
function will be {B — Sj)\p'j — Pj\ < B\p'j — Pj\, so ( |6.4| ) holds, as desired. 

For the remaining case, suppose p'j < pj and consider an optimal solution to LPj_i 

that maximizes the potential function Ylh=o ^ ' "^(i)^)- claim that this solution 
must be such that 

for all levels h, if v{j, h) > 0, then v{i, h + Sj) = for all i ^ j (6-5) 

Suppose not, and hence there is a level h and an integer i j such that v{j, h) > 
and v{i, h + Sj) > 0. This means that a positive amount of size Sj was placed in bins 
with level h and then a positive amount of size Si was placed in bins with the resulting 
level h + Sj. Let A = min{t>(j, h),v(i, h + Sj)}, and modify the solution so that instead 
of first placing an amount A of sj in bins of level h and then adding A of size Sj, we do 
these in reverse order. To be specific, revise v{j, h) to f (j, h) — A, v{i, h) to v{i, h) + A, 
v{i, h + Sj) to v{i, h + Sj) — A and f (j, h + Sj) to f (j, h + s,) + A. It is not difficult 
to see that this will not affect the objective function or any of the constraints, and so 
the new set of variable values will continue to represent an optimal solution to LPj_i. 
Moreover, the potential function will have increased by SjA, a contradiction. 

To convert the above optimal solution to one that is feasible for LPj, we proceed 
as follows. Let H* = mm{H < B : Y.hZHv{j,h) < pj - p]. Set v^j^h) = 0, H* < 
h < B — 1, and reduce v{j, H* — 1) hj pj — p'- — ^Y^=h* "^(i; The resulting solution 
will now satisfy the constraint of type ( |5.3|) for j in LPj. It will continue to satisfy the 
constraints of type ( |5.4|) because of (|6.5|) . Finally, the increase in the objective function 
will be at most Sj\pj —p'j\ < B\pj —p'j\ and so ( |6.4| ) again holds, as desired. ■ 

Definition 6.4 If p is a probability vector and r >0, then aug{p,r) is the probability 
vector q with 

Pi + r 



1j 



ifj = 1 



— otherwise 
1 + r 
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Lemma 6.5 Suppose F is a discrete distribution with probability vector p. Let r,r' > 
and define q = aug{p, r) and q' = aug{p, r'). Then 



Proof. By Definition, 



\q-q 



E 

i=2 



\q ~ q'W < 2|r — r'\. 

Pi + r pi + r' 



Pj 



< 



1 + r 1 + r' 
1 



1 + r 1 + r' 



1 + r 1 + r' 



:i + r') 



:i+r) 



(l + r')(l +r) 



+ 



1 + r 1 + r' 



r(l + r') — r'(l + r) 



(l + r')(l + r) 



< 2 |r - r'l 



Lemma 6.6 Suppose F is a discrete distribution with s = (si, . . . , sj), and F' is the 
empirical distribution measured after sampling n items with sizes chosen according to 
F for some n > 0. Let q = aug{p, c{F)) and q' = aug{p, c{F')). Then for all jS > 0, 

J(3 



a) P I Hp — p'W > 



< 2Je" 



(b) P (^\c{F) - c{F')\ > < 2Je-^' 



Iq-q'W > 



< 2Je" 



n 



Proof. By a straightforward application of the Chernoff bound, as described for ex- 
ample in [|AS92| , pp. 234-236], we have that for all j, 1 < j < J, and /? > 0, 

,3 



Thus the probability that the bound is exceeded for at least one j is no more than 
2Je~^ . However, if \\p — p'\\ > jp/\/2n then the bound must be exceeded for some j. 
Hence conclusion (a) holds. Conclusions (b) and (c) follow by Lemmas Ol and |0|. ■ 



Lemma 6.7 Suppose F and F' are discrete distributions over the same size vector s = 
(si, . . . , sj), q = augip, c{F)), q' = aug{p, c{F')), and r' = c(F')/(l + c(F')). Suppose 
Qmin is the smallest nonzero entry in q and \\q — q'\\ < q-min- Then if the algorithm 
^^d{Uf) ^'^ o-ppli^d to a list L of n items generated according to F , the resulting packing 
P of L plus the imaginary items created by SS''^^-^^-^ satisfies 

E[W{P)] = 0(max{n||g-g II, Vn}). 
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Proof. Since ||g — < qmin, we have that for all j with qj > 0, q'j > qj ~ \\q — q'\\ > 
qj{l-\\q-q'\\/qmin) > 0. Let 5 = \\q-q'\\/qmin- Then for all j we have g^- > {l-6)qj > 0. 

Suppose items are generated according to F and we use SS^f^jj^-^ to pack them. 
At each step, we will thus be using SS£)(^Up) to pack an item that looks as if it were 
generated according to the probability vector q'. Let us view the packing process as 
follows: When an item of size Sj arrives, randomly classify it as an good item with 
probability (1 — S)qj/qj and as a bad item with probability 1 — (1 — 5)qj/q'y Note that 
if one restricts attention to the good items, they now arrive as if generated according 



to q. Further note that by Claim |6.L1| of Theorem |6.1| , the distribution determined by 
g is a perfectly packable distribution. Thus for these arrivals we can apply Lemma |2.2| , 
which we have already shown applies to SSei{Uf) ^^^^ '^'S'. Thus we can conclude 
that the expected increase in ss{P) each time a good item is packed is less than 2. 

Let D denote the constant (1 + qmin) /Qmin- The probability that a random item is 
a bad item is 

E (l - = E - + H) <U- +S = D\\q- q'W 

i=l ^ ^ i=l 

For bad items, the worst-case increase in ss{P) is less than 2 maxj{A'^p(j)} + 2, an 



upper bound by Lemma 1.2 on the increase that would occur if our placement caused 



the maximum count to increase. Thus the expected increase in ss{P) is less than 

2 f 1 + - g'll max{iVp(j)}l (6.6) 



Let Pi be the packing after i items have been packed and let i{t), 1 < t < n, be 
the index of the packing that results when the tth real item is packed, with z(0) = 
by convention. Define 

Maxt = max{l, Np^^^^{j) : 1 < j < J}, 1 < t < iV 
MaxE = ma:x{E[Maxt] : < t < n} 



Claim 6.7.1 For all t , < t < n, and all i, i{t) < i < i{t + 1), the maximum level 
count in Pi is at most Maxt ■ 

Proof of Claim. The claim holds by definition for Pi(t)- Suppose it holds for packing 
Pi and i + 1 < i{t + 1), i.e., the next item to be packed is imaginary. Note that the 
fact that imaginary items (of size 1) can be generated implies that there are no dead- 
end levels. Since SS"^ by assumption knows this, this means that it is not forbidden 
from making any legal move by its requirement to avoid creating dead-end levels, and 
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must make an improving move whenever one exists. Suppose the current packing has 
a count greater than and j is the level with the biggest count, ties broken in favor of 
larger levels. Then there is at least one bin with level j and placing an item of size 1 
into such a bin will decrease ss{P). Thus SS^' must choose a placement that decreases 
ss{P). This cannot increase the largest level count. Suppose on the other hand that 
the current packing has no level count exceeding 0. Then placing an imaginary item 
will only increase the maximum level count from to 1, which is still no more than 
MaXiit). In both cases, we are left with a packing in which no count exceeds Maxm)- 
The claim follows by induction. ■ 

Claim 6.7.2 ForQ<t <n, 

E[ss{Pt+i)- ss{Pt)\Pt] < 2B{1 + D\\q-q'\\MaxE). 

Proof of Claim. For each /c > 0, the probability that there are more than k items 
packed in going from Pt to Pt+i is (c(F')/(l + c(F'))) . Given that there are more 
than k items packed, the expected increase in ss{P) due to the packing of the k + 1st 
item is by ( |6.6| ), Claim |6.7.1| , and the definitions of Maxt and MaxE at most 

2(1 + D\\q- q'\\E[Maxt]) < 2(1 + D\\q - q'\\MaxE). 

The total expected increase in going from Pt to Pt+i is thus at most 

£ ( l+f(l/) ) + ^11^ - Q'WMaxE) = 2(1 + c(F'))(l + D\\q - q'\\MaxE) 

The claim follows since by definition c{F) < B — 1 for all distributions F. ■ 
Thus by the linearity of expectations we can conclude that for 1 <t < n 

E[ss{Pt)] < 2Bt{l + D\\q- q\\MaxE) (6.7) 
and, by inequality ( |2.4| ) in the proof of Lemma |2.3| , that 

B-l 



E[Maxt] < E 



< l + ^/B-E[ss{Pt)] 



< 1 + y/2Bt (1 + D\\q - q'WMaxE) 

< 2^Bn (1 + - ^\MaxE) 



MaxE <2y/Bn{l + D\\q-q'\\MaxE). {6.i 



and hence 



If D\\q — q'\\MaxE < 1, we have E[ss{Pn)] < ABn by (|6.71 ). So by Lemma |2.3| we 
have 



E[W{Pn)] < VB ■ E{ss{Pn)] < 2BV^ 
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Otherwise we have by ( |6.8| ) that MaxE < 2^y2BDn\\q — q'W^J MaxE]. But this im- 
phes MaxE < 8BDn\\q — q'\\, and consequently by 



E[ss{Pn)] < 2Bn + lQ{BD\\q- q\\nf 



and hence by Lemma |2.3| that 

E[W{Pn)] < ^/BE[ss{Pn)] = 0{n\\q-q'\\) 
for fixed F. Thus EW^^" {F') = 0{max{n\\q — q'\\, \/n}) and Lemma 3?7 is proved. 



We can now address part (b) of Theorem |6^ . Let us divide the waste created by 
SS* into three components. Let ua denote the number of items seen before all sizes in 
Up have appeared. 

• Waste in bins created during the packing of the first ua items (during what we 
called false phases). 

• Waste in bins created after the first ha items have been packed, either during the 
0-phase or during an i-phase, i > 0, for which ||^ — ^'|| > qmin in the terminology 
of Lemma KTl ( Type 1 true phases) . 



• Waste in the remaining bins ( Type 2 true phases) . 

For waste in bins created during false phases, we first determine a bound on 
The analysis is similar to that used in the proof of Theorem p.lO| . The probability 



that we have not seen all item sizes after the /ith item arrives is J(l —pmm)^- If we 
choose the smallest t such that J(l — PminY < 1/2, then for each integer m > 0, the 
probability that all the item sizes have not been seen after mt items have arrived is at 
most 1/2"^. Thus for each i > 0, the probability that ua G {mt, {m + l)t] is at most 
1/2™. Hence 

OO 00/ ^\ 

E[nA] < 5^(m + l)t-p[n^e M,(m+l)t]]) <t-J2 2^ = 4^- 

m=0 m=0 

Thus the expected false phase waste resulting from bins that contain at least one real 
item is bounded by 4t(i? — l)/B. 

The only other possible waste during false phases consists 1 unit of waste for each 
bin containing only imaginary items. The expected number of imaginary items that 
arrive before all item sizes have been seen is bounded by {ua + l)c{Fmax), where F^ax 
is the empirical distribution F' that has the largest value of c{F') among all those 
computed before all item sizes have been seen. Since c{F') < B — 1 for all distributions 
F' this is at most {At + 1){B — 1). Moreover, all but one of the bins containing only 
imaginary items that are started during a given phase must be completely full: as 
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already remarked, if there are any partially filled bins when an imaginary item (of size 
1) arrives, then placing it in a bin whose level has the largest count (ties broken in favor 
of higher levels) will cause a decrease in ss{P) and hence is to be preferred to starting 
a new bin. Thus the expected number of bins containing only imaginary items is at 
most {4t + 1){B — 1)/ B plus the expected number of false phases. Since the number 
of false phases is clearly less than ua/ (10-B) + J, the total expected waste during false 
phases is at most 8t + J + 1 = 0(1) for fixed F. 

We now turn to the Type 1 true phases. The first of these is the true 0-phase, 
which is Type 1 by definition. In this phase the expected number of real items packed 
is at most 105 and the expected waste is at most 20B + 2 by an argument like that in 
the previous paragraph. 

By a similar argument, if there is a true z-phase, i > 0, the number of real items 
packed in it is at most SOB ■ 4*~^ and the expected waste during the phase is at most 
605 ■ 4*~^ + 2 < IQB ■ 4*. Whether this phase contributes to the Type 1 waste depends 
on the empirical distribution F' measured at the beginning of the phase. In particular, 
we must have ||g — q'\\ > qmin- 

Now the distribution F' is based on at least 105 • 4*~^ samples from F. Thus by 
Lemma |6l6| (c), the probability that ||^ — ^'|| > \/2JBP / \/2.554* is bounded by 2Je~^ . 
Thus the probability that — > qmin is at most 2Je~^^'^^'''"'"/'^ = 2Jd~^' where 
d = e^'^^''™'"/'' ^ > 1 is a constant independent of i. The expected waste that this phase 
can produce by being a Type 1 phase is thus at most {32BJ){4^/d'^'). Summing over 
all true phases we conclude that the total expected waste for Type 1 phases is at most 



Finally, let us turn to the waste during Type 2 true phases. Suppose the true i- 
phase, i > 0, is of Type 2, and let F' be the empirical distribution at the beginning 
of the phase, with p' being its probability vector. F' must have been based on the 
observation of at least 1054*"^ items generated according to F. Thus by Lemma 
|6T6|(b) there are constants a and 7 depending on F but independent of i such that 
E[\c{F) - c(F')|] < 7/V554^ = a2-\ 

Let Ni be the number of real items packed during the true i-phase, and recall that 
Ni < 3054*"^. This means that the expected waste due to imaginary items created 
during the phase is at most 



Note that the total number of true phases is at most [log4(n/105)] < Llog4nJ = 
[(1/2) loga^J. Thus even if all such phases are of Type 2, we have that the expected 
total waste during the Type 2 phases due to imaginary items is bounded by 




00 



i=l 



Nic{F') ^ N,B{c{F)+a2-^) ^ N,c{F) 
B - B - B 



+ 7.5a2\ 



nc{F) 
B 



Llog4 nj 



nc{F) 



nc{F) 
B 



+ 7.5a V 2' 



< 



B 



+ 15a^/n 



+ 0(v^) 



i=l 
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Now let us consider the waste caused by empty space in the bins packed during true 
phases of Type 2. First note that the set of items contained in open bins at the end 
of the z-phase consists of all items packed during this phase plus possibly items from 
immediately preceding true phases that operated with the same value of r. Even if all 
preceding true phases operated with the same value of r, this could be no more than 
1054* items. Moreover, as argued above we know that the empirical distribution F' 
computed at the beginning of the i-phase has £'[|c(-F) — c(-F')|] < a2~' for some fixed 
a, so that by Lemma |63| , E\\\q — q'\\\ < 2a2~\ Since this is a Type 2 phase, we have 
be definition that ||g — ^'|| < q-min and so Lemma applies and we can conclude that 
there is a constant 7 such that the expected empty space in the packing is bounded by 



7max|(10S4*)(2«2-*), VlOS4^| =0(2* 



Thus the expected total empty space of this kind over all true phases of Type 2 is 
once again 0{^Jn), and so the expected total waste in bins started in Type 2 true phases 
(empty space plus imaginary items) is nc{F)/B + 0{^/n). Given that the expected 
waste in false levels and in true levels of Type 1 was bounded, this means that 

EWr{F) = ^ + OiV^) 



which by Theorem 5^ means that Claim (b) of Theorem |6.2| has been proved. 

It remains to prove Claim (e), that EW^^' (F) = 0(1) whenever EW^^'^{F) = 
0(1), i.e., whenever F is a bounded waste distribution. Suppose F is a bounded waste 
distribution with size vector s and probability vector p. From the Courcoubetis- Weber 
Theorem, we know that there is an e > such that any distribution F' over the same 
set of item sizes that has a probability vector p' satisfying \\p — < e is a perfectly 
packable distribution and hence has c{F') = by Theorem |5.6| . 

Once again, we can divide the waste produced in an SS* packing of a list generated 
according to F into three components, although this division is somewhat different. 

• Waste in bins created during false phases. 

• Waste in bins created in true phases through the last such phase in which the 
starting empirical distribution F' had c{F') > 0. 

• Waste created in all subsequent phases. 

As in the analysis of Claim (b), we can conclude that the total expected waste for the 
false phases is bounded. 

Consider now the waste created in true phases through the last phase that started 
with c(F') > 0. If this was the true 0-phase, the expected waste is bounded by 20-B + 2, 
again as argued in Claim (b). If it was the true i-phase, i > 0, then at most 10-B4* 
items can have been packed in true phases through this point, and so the expected 



54 



waste would be at most 8054* + 2 < 8154* by an analogous argument. Now the 
probability that the z-phase is the last phase with c{F') > is clearly no more than 
the probability that it simply had c{F') > 0. As remarked above, this can only have 
happened if \\p — p'\\ > e. Since the empirical distribution at the start of the i-phase, 
i > 0, is based on at least 1054*"^ samples from F, by Lemma |6.ti| (a), the probability 
that \\p — p'W > e is at most 2Je~^^^'^ 1"^ = 2Jd~^'' for some d > 1. Thus the total 
expected waste through the last true phase with c(F') > is at most 



205 + 2 + 



815 ^ 4* 



2J 



Oil] 



Finally, if there are any phases after the last one that had c{F') > and hence 
r > 0, let the first such phase be the io-phase. This phase begins by closing all 
previously open bins because r has just changed from a positive value to 0. From now 
on, however, no more bin closures will take place since r = for all remaining phases 
and hence never changes. Thus the packing beginning with the io-phase is simply an 
^^d{Uf) ~ S^DiUp) packing of items generated according to F, and by Theorem p. 10 
has 0(1) expected waste. 

Thus the total expected waste under SS* is 0(1), Claim (e) holds, and Theorem 
O is proved. ■ 



7 SS and Adversarial Item Generation 

The results for SS* in the previous section are quite general with respect to the context 
traditionally studied by papers on the average case analysis of bin packing algorithms: 
the standard situation in which item sizes are chosen as independent samples from 
the same fixed distribution F. However, that context itself is somewhat limited, in 
that one can conceive of applications in which some dependence exists between item 
sizes. Perhaps surprisingly, the arguments used to prove Theorems |2]^ and |3]^ imply 
that 5*5* itself can do quite well in some situations where there is dependence and that 
dependence is controlled by an adversary. 

Suppose that our item generation process works as follows: Let 5 be a fixed bin 
size. For each item Xi, i = 1, 2, . . ., the size of item Xi is chosen according to a discrete 
distribution Fi with bin size 5. The choice of Fi, however, is allowed to be made by 
an adversary, given full knowledge of all item sizes chosen so far, the current packing, 
and the packing algorithm we are using. It would be difficult to do well against such 
an adversary unless it were somehow restricted, so to introduce a plausible restriction, 
let us say that such an adversary is restricted to J-', where is a set of discrete 
distributions, if all the Fi used must come from J-'. As a simple corollary of the proof 



of Theorem |2.4| we have the following. 
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Theorem 7.1 Let B be a given bin size and suppose items are generated by an adver- 
sary restricted to the set of all perfectly packable distributions for bin size B. Then the 
expected waste under SS is 0{^/n). 



Proof. By Lemma p.2| we know that E[ss[P)\ increases by less than 2 whenever SS 
packs an item whose size is generated by a perfectly packable distribution. Thus we 
can conclude that if we pack n items generated by our adversary, we still must have 



E[ss{P)] < 2n. The rest follows by Lemma |2.3| , as in the proof of Theorem |27 

Note that without the restriction to perfectly packable distributions, the adversary 



could force the optimal expected waste to be linear, so Theorem |73| is in a sense the 
strongest possible result of this sort. With even more severe restrictions on JF, one can 
guarantee bounded expected waste against an adversary. 

Theorem 7.2 Suppose J-' is a set of bounded waste distributions none of which has 
nontrivial dead-end levels, and there is an e > such that every distribution that is 
within distance e of a member of J-' is perfectly packable distribution. Then if items are 
generated by an adversary restricted to T , the expected waste under SS is 0(1). 



Proof. This follows from the proof of Theorem p.4|, since the general hypothesis of 
Hajek's Lemma allows for adversarial item generation. Essentially the same proof as 
was used to show Theorem |3]^ applies. ■ 



Theorem 17^ seems very narrow, but it has an interesting corollary. 



Corollary 7.2.1 Suppose JF = {f/{j, /c}, 1 < j < A; — 2} for some fixed k > 0. Then 
if items are generated by an adversary restricted to T , the expected waste under SS is 
0(1). 



Proof. As shown in [|CCG+00| , |CCG+02|| , EW^^^{F) = 0(1) for all these distribu- 



tions, and so by the Courcoubetis- Weber theorem for each j, 1 < j < A; — 2, there is an 
ej > such that all distributions F within distance ej of J{j, k} are perfectly packable 
distributions. We simply take e = minjej '■ 1 < j < k — 2} and apply Theorem |7]^. ■ 



If we omit from Theorem |7.2| the requirement that the distributions in have no 
nontrivial dead-end levels, then the best upper bound on the expected waste for SS 
grows to O(logn), as follows from the proof of Theorem p.ll| . Note that we cannot 



improve this to 0(1) by using 5*5" instead of SS as we did in the non-adversarial 
case. For example, the adversary could generate its first item using the distribution 
that yields items of size 1 with probability 1, and then switch to a bounded waste 
distribution with nontrivial dead-end levels. SS', having seen an item of size 1, would 
conclude that 1 E Up and hence that there are no dead-end levels. So from then on it 
would pack exactly as SS would and hence would produce fl{logn) waste as implied 



by the lower bound in Theorem 3.11 
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8 The Effectiveness of Variants on SS 



In this section we return to the standard model for item generation, and ask how much 
of the good behavior of SS depends on the precise details of the algorithm. It turns 
out that SS is not unique in its effectiveness, and we shall identify a variety of related 
algorithms A that share one or more of the following sublinearity properties with SS 
(where (a) is a weaker form of (b)): 

(a) [Sublinearity Property]. U EW^^^{F) = O(v^), then EW^{F) = o{n). 

(b) [Square Root Property]. If EW^^^{F) = O(v^), then EW^{F) = 0{y/n). 

(c) [Bounded Waste Property]. If EW^^^{F) = 0(1) and F has no nontrivial dead- 
end levels, then EW^{F) =0(1). 



8.1 Objective functions that take level into account 

One set of variants on SS are those that replace the objective function ss{P) by a 
variant that multiplies the squared counts by some function depending only on B and 
the corresponding level, and then packs items so as to minimize this new objective 
function. Examples include 

B-i B~l B-l .2 

J2NpihfiB-h), 5^ [iVp(/.)(i? -/.)]% and E 

h=l h=l h=l 



The first of the above three variants was proposed in 1996 by David Wilson [|Wil 
before we had invented the algorithm 5*5* itself. Wilson's unpublished experiments with 
this algorithm already suggested that it satisfied the Square Root and Bounded Waste 
Properties for the U{j, k} distributions, a claim we can now confirm as a consequence 
of the following more general result. 

Theorem 8.1 Suppose f{h,B) is any function of the level and bin capacity, and A 
is the algorithm that packs items so as to minimize "Yl^Zl ^p{hYf{h,B). Then A 
satisfies the Square Root and Bounded Waste Properties. 



Proof. Such algorithms satisfy the Square Root Property, since by Lemma p.2| the 
expected increase in the objective function at each step is still bounded by a constant 
(2 max{/(/i, 5) : \ < h < B — 1}). They satisfy the Bounded Waste Property, since 



the proof of Theorem need only be modified to change some of the constants used 



in the arguments. Details are left to the reader. 



We conjecture that the EW^^{F) = B(logn) result of Theorem p.ll| for distri- 
butions F with nontrivial dead-end levels also carries over to these variants, but the 
length and complexity of the proof of the original result makes verification a much less 
straightforward task. 
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As to which of these variants performs best in practice, we performed prehmi- 
nary experimental studies using the distributions studied in ||CJK+99|| , i.e., U{h, 100}, 



1 < < 100 (as defined in the Introduction), and ?7{18, j, 100}, 18 < j < 100, 
where U{h,j, k} is the distribution in which the bin size is k, the set of possible item 
sizes is 5* = {18, 19, . . . , /i}, and all sizes in S are equally likely. The distributions 
in the first class are all bounded waste distributions except for U{99, 100}, for which 
EW^^^{F) = Q{y/n). The distributions in the second class include ones with all three 
possibilities for EW^^^{F): 0(1), Q{y/n), and 9(n). We also tested a few additional 
more idiosyncratic distributions. The values of n tested typically ranged from 100,000 
to 100,000,000. Our general conclusion was that there is no clear winner among SS 
and the variants describe above; the best variant depends on the distribution F. 

8.2 Objective functions with different exponents 

A second class of variants that at least satisfy the Sublinearity Property is obtained 
by changing the exponent in the objective function. 

Theorem 8.2 Suppose SrS denotes that algorithm that at each step attempts to min- 
imize the function '^h=ii-^pWy ■ Then for all perfectly packable distributions F , 



EWt^{F) 



O n- , 1 < r < 2 



r-l 



O n~ , 2 < r < oo 



(Note that when r = 2 both bounds equal 0{^Jn), the known bound for SS = S2S .) 
Proof. Suppose P is any packing and a random item i is generated according to F. By 



the argument used in the proof of Lemma |2.2|, we know that for there is an algorithm 



Ap such that if i is packed hj Ap, then for each h, 1 < h < B — 1, the expected 
increase in Np{hY given that Np{h) changes and that the current value Np{h) > 0, is 
bounded by 



{{Np{h) + ir -Np{hr) + -{[Np{h)-ir -Np{hr) 

{Np{h) + ly + {Np{h) - 1) 



-Np{hy. 

Let X = ma.x{Np{h) : 1 < h < B — 1}. Given that at most two counts change when 
an item is packed and that the expected increase for a zero-count is at most 1^ = 1, 
the expected increase in '^h=i i-^PnWY when i is packed is thus at most 

max|2, (a; + l)" + (a;-l)^-2a;"}. (8.9) 
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Since SrS packs items so as to minimize '^^=1 i^PnWY ' expected increase in this 
quantity when we pack i using SrS instead of Ap can be no greater. 

We thus need to bound ( ^.91 ) when r is fixed. For x < 2, it is clearly bounded by 
a constant depending only on r, so let us assume that x > 2. To bound ( p.9|) in this 
case, we know by Taylor's Theorem that there exist 9i and 02, < 0i,92 < I, such that 

^a: + lY = x^ + rx^-^ + !±^a:^-^ + !:t^}^^:^[a: + e,r'' (8.10) 

{x-iy = x' - rx"--' + ~ ^K '-' - ~ ~ (x - ^2)^"' (8.11) 

2 . 3 • 



Substituting, we conclude that ( ^.9|) is bounded by the maximum of 2 and 

r(r - l)x^-2 + ~ ~ [(x + ^1)^-=^ - (x - ^^2)'"'l (8.12) 

L J 

If 1 < r < 2, then ( p.l2|) has a fixed bound depending only on r when x > 2. Thus if 
Pn is the packing that exists after all n items have been packed by SrS, the expected 
value of j2h=ii^Pr.ih)Y is 0{n). If r > 2, then ( |83^ ) grows as 0(x'^'2) _ ofn''-^). 
Thus in this case the expected value of ^^ll{Np^{h)Y is 0(n''^^). 

Let = ^^I^ P[Np{h) = i], < i < n. Note that ELi Ci = 5 and ^"^^ iC, is 
the expected number of partially filled bins in the packing and hence an upper bound 
on the expected waste. We can bound this using Holder's Inequality: 



Y: aA < (E af) " hf) " when 1 + 1 = 1 (8.13) 



1 r-i r 

Set Oj = iiCi)^ , hi = (Ci) ^ , p = r, and q = . In the case where 1 < r < 2, 

r — 1 

we have concluded that there is a c? such that Yll=i ^i"^^ — Thus Holder's Inequality 
yields 

E[W{Pn)] < $^2a<(5^az^)'(5^Q)'^<(rfn)^5"^ = 0(n^) 

as claimed. On the other hand, if r > 2 we have Yl'i=i ^i"^^ — dnJ'"^ for some constant 
d and so Holder's Inequality yields 

E[W{Pr.)] < ^2a< (j]ar)'($^a)'^<rf'Ti^i?^=0(n"^) 
as claimed. ■ 



Despite the differing qualities of the bounds in Theorem |8.2| , limited experiments 
with the SrS for r = 1.5, 3, and 4 revealed no consistent winner among these variants 
and SS. Indeed, they suggest that these algorithms, and perhaps all the algorithms 



59 



SrS with r > 1, might satisfy the Square Root and Bounded Waste Properties as 
well as the Sublinearity Property. Although we currently do not see how to prove 
these conjectures in general, we can show that the algorithms SrS satisfy the Bounded 
Waste Property when r > 2. 

Theorem 8.3 //r > 2 and F is a bounded waste distribution with no nontrivial dead- 
end levels, then EW^''^{F) =0(1). 



Proof. As in the proof of Theorem |3.4| , we apply Hajek's Lemma. By an argument 
analogous to the one used in that proof, it is straightforward to show that the desired 
conclusion will follow if Hajek's Lemma can be shown to apply to the potential function 

l/r 

\h=l 

For this potential function, the Initial Bound Hypothesis applies since we begin 
with the empty packing. The Bounded Variation Hypothesis applies since for a given 
value y of (pix), the maximum possible change in (p occurs when a single entry in x 
equals y and all the rest are 0, in which case can increase to at most y + 1 and 
decrease to no less than y — I. 

The main challenge in the proof is proving that the Expected Decrease Hypothesis 
applies. For this we need the following results, analogues of Lemmas and |^, 

used in the proof of Theorem | 



Lemma 8.4 Let F be a perfectly packable distribution and r > 2. Then there is a 
constant d, depending only on r, such that if P is an arbitrary packing into bins of 
size B whose profile is given by the vector x with > i is an item randomly 

generated according to F , and x' is the profile of the packing resulting if i is packed 
into P according to SrS , 

E [(t){x'Y : x] < (PixY + dcPixy-^. 

Proof. Note that for all Xh < (p{x)-, l<h<B — Ihy definition. The result thus 
follows by ( |8.12| ) in the proof of Theorem p.2| . ■ 



Lemma 8.5 Let y and a be positive and r > 2. Then 

y-a<y^^. (8.14) 
ra^ ^ 

Proof. Consider the functions fa{y) = {y ~ o-) — {u^ — a"^) / {rd''~^) , a > 0. We need to 
show that for all a > 0, fa{y) < whenever y > 0. But observe that the derivative 

faiy) = 1 



is greater than ii y < a, equals if y = a, and is less than if y > a. Thus fa{y) 
takes on its maximum value when y = a, in which case it is 0, as desired. ■ 
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Lemma 8.6 Suppose F is a distribution with no nontrivial dead-end levels and r > 2. 
Let P be any packing that can be created by applying SrS to a list of items all of whose 
sizes are in Up- If x is the profile of P and (^(x) > r'^B^'^^^^' where B is the bin size, 
then there is a size s G Up such that if an item of size s is packed by SrS into P, the 
resulting profile x' satisfies 



x'Y < (j){xy 



(f){x 



_g(r2-l)/r 



Proof. Let Xh be the largest level count. By the definition of we have (f){xY < Bx]^ 
and hence Xh > (j){x)/B^^'^ > r^B^. Thus h cannot be a nontrivial dead-end level and 
as in the proof of Lemma p.7| , there must be some h' > h and size s G Up such that 
h' + s < B and 

A = Xh'- Xh'+s > Xh/B > > r^B'~\ 

Let y denote Xh'+s- Then if an item of size s were to be packed, we could reduce 
X]f=i by at least 

[y + AY - {y + A - ly + f - [y + 1Y. 



Using Taylor's Theorem as in the proof of Theorem |8]^ but with one fewer term in the 
expansions than in ( ^.101) and (|8.11|) , we conclude the reduction is at least 



r{y + A) 



r-1 



r{r - l){y + A - OiY 



r-2 



ry 



r-1 



r{r - l){y + 62 



vr-2 



where < 6^1 , 6^2 < 1 . But note that the amount we must subtract due to the two lower 
order terms is less than 



r{r - l){y + AY~' < 



r{y + A) 



r-1 



< 



Since the higher order terms are r{y + AY ^ — ry^ ^ > rA^ ^, we can conclude that 



{y + A)/{r-l) 
ten 

must decrease by at least 

(r - 1)A'-^ >{r-l] 



rjy + A) 
A/r 

ry^ 



r-1 



< 



riXh 



vr-l 



rB' 



r-1 



[X 



r-1 



> 



(X 



,r-l 



2J(r2-l)/r 



as claimed. ■ 

To prove that (j) satisfies the Expected Decrease Hypothesis of Hayek's Lemma, we 
argue much as in the proof of Theorem |3.4| . Since F is a bounded waste distribution, 
there is an e > such that the process of generating items according to F is equivalent 
to generating items of the size s specified in Lemma ^.61 with probability e and otherwise 



61 



generating items according to a slightly modified perfectly packable distribution F' . By 
Lemmas p.4| and |8.6| , the expected increase in (f){xY is then at most 



e)d(f){x 



_g(r2-l)/r 



which, assuming (f){x) is sufficiently large, is less than —h(f){xy ^ for some constant 
6 > depending only on F and r. By Lemma p.5| we thus have 



E[<p{x')-m]<- l)' = — 



r0[x]' ^ r 



and so the Bounded Decrease Hypothesis holds for 0, Hajek's Lemma applies, and we 
can conclude as in Theorem O that EW^''^{F) =0(1). ■ 



8.3 Combinatorial variants 

In this section we consider satisfying the Sublinearity Property with algorithms that 
don't depend on powers of counts. As our first two candidates, consider the algorithms 
that are in a sense the limits of the SrS algorithms as r ^ 1 and r — >■ oo, a promising 
approach since the SrS algorithms all satisfy the Sublinearity Property and may even 
satisfy the Square Root Property. 

An obvious candidate for a limiting algorithm when r — > 1 is SIS, the algorithm 
that always tries to minimize Ylh=i -^pW^ number of partially filled bins. 

To do this, we simply must never start a new bin if that can be avoided and must 
always perfectly pack a bin when possible (i.e., if the size of the item to be packed is 
s and there is a partially full bin with level B — s, we must place the item in such a 
bin). By itself this is not a completely defined algorithm, since one needs to provide a 
tie-breaking rule. If we use our standard tie-breaking rule (always chooses a bin with 
the highest acceptable level), note that SIS reduces to the classic Best Fit algorithm. 
As already observed in the Introduction, Best Fit provably has linear expected waste 
for the bounded waste distributions U{8, 11} and U{9, 12}, and empirically seems to 
behave just as poorly for many other such distributions ||C JSW93"| . We doubt that 
any other tie-breaking rule will do better. For instance, if we always choose the lowest 
available level when the item won't pack perfectly, we typically do much worse than 
Best Fit. Thus no SIS algorithm is likely to satisfy the Sublinearity Property. 

Taking the limit of SrS as r oo seems more promising. Assume by convention 
that Np{B) is always 0. Then SooS is the algorithm that places an item of size s 
into a bin of level h for that h with the maximum value of Np{h) in {h : 1 < h < 
B — s, and Np{h) > Np{h + s)}, should that set be non-empty, and otherwise places 
the item in a bin with level /i > for that h with the minimum value of Np{h + s), ties 
always broken in favor of the higher level. It is easy to see that for any fixed packing 
these are the choices that will be made by SrS for all sufficiently large values of r. 
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Experiments suggest that SooS has bounded expected waste for C/{8, 11} and 
f/{9,12} as well as all the bounded waste distributions U{h, 100}, 1 < h < 98. 
It still violates the Sublincarity Property, however. For example, EW^^^{U {18 : 
27, 100}) = Q{^/n) but experiments clearly indicate that SooS has linear waste for 
this distribution. A simpler distribution exhibiting the behavior is F with B — 51, 
Up — {11, 12, 13, 15, 16, 17, 18}, and all sizes equally hkely. Experiments convincingly 
suggest that EW^°°^ (F) = Q{n), but it is easy to see that this is a perfectly packable 
distribution, since both the first four and the last three item sizes sum to = 51. 
Moreover, if one modifies F to obtain a distribution F' in which items of size 1 are 
added, but with only 1/10 the probability of the other items, one obtains a bounded 
waste distribution for which SooS continues to have linear waste. Using other tie- 
breaking rules, such as preferring the lower level bin, appears only to make things 
worse. So no SooS algorithm is likely to satisfy the Sublincarity Property. 

Not surprisingly, the simpler combinatorial variants obtained by using just one of 
the two rules from the definition of SooS also fail. In the first of these, Smaxh, we 
always place an item x in a bin whose level has maximum count among all levels no 
greater than B — s{x), assuming that the count for empty bins is by definition 0. In 
the second, Sminh, we place the item so as to minimize the count of the resulting 
level, assuming that the count for full bins is by definition 0. Smaxh has linear waste 
for ^7{8, 11} and U{9, 12}, perhaps not surprising since even if the item to be packed 
would perfectly fill a bin, Smaxh may well choose not to do this. Sminh is better, 
seeming to handle the U {j, k} appropriately. However, it has linear waste on the same 
three perfectly packable/bounded waste distributions mentioned above on which SooS 
also failed. Perhaps surprisingly, its constants of proportionality appear to be better 
than those for SooS on these distributions. This may be because, unlike the latter 
algorithm, it will choose a placement that perfectly packs a bin when this is possible. 

Indeed, perfectly packing a bin when that is possible would seem like an inherently 
good idea. We know that it is not necessary to do this, since SS doesn't always do it, 
but how could it hurt? Let perfectSS be the algorithm that places the current item so 
as to perfectly pack a bin if this is possible, but otherwise places it so as to minimize 
ss{P). Surely this algorithm should do just as well as SS. Surprisingly, there are cases 
where this variant too violates the Sublincarity Property. 

Consider the distribution F with bin size B = 10, Up = {1,3,4,5,8}, p{l) = 
p{3) = p{5) = 1/4, and p{4) = p{8) = 1/8. This is a perfectly packable distribution, as 
the probability vector can be viewed as a convex combination of the perfect packing 
configurations (8, 1, 1), (4,3,3), and (5,5). However, experiments show that perfectSS 
has linear waste for this distribution (as does Sminh but not SooS). Why does this 
happen? Note that essentially all the items of size 1 must be used to fill the bins that 
contain items of size 8. Thus whenever a 1 arrives and there is a bin of level 8, we need 
to place the 1 in such a bin. Unfortunately, perfectSS will prefer to put that 1 in a bin 
with level 9 if such a bin exists, and bins with level 9 can be created in other ways than 
simply with an 8 and a 1. Three 3's or a 5 and a 4 will do. On average this happens 
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enough times to ruin the packing. (The count for level 9 never builds up to inhibit 
the nonstandard creation of such bins because level 9 bins keep getting filled by I's.) 
Standard SS avoids this problem and has Q{^/n) expected waste because it allows the 
counts for levels 8 and 9 to grow roughly as ^/n, with the latter being roughly half 
the former. This means that placing a 1 in a bin with level 8 is a downhill move, but 
creating a level 9 bin by any other means is an uphill move. 



8.4 Variants designed for speed 

Our final class of alternatives to 5* 5* are designed to improve the running time, possibly 
at the cost of packing quality. Recall that J denotes the number of item sizes under 
F. The Q{nB) running time for the naive implementation of SS can be improved 
to 0(nJ) by maintaining for each item size s G Up the list-of-lists data structure we 
introduced to handle items of size 1 in the implementation of algorithm SS* described 
in Section ^. This approach unfortunately will not be much of an improvement over the 
naive algorithm for distributions F with large numbers of item sizes, and it remains an 
open problem as to whether SS (or any of the variants described above that satisfy the 
Sublinearity Property) can be implemented to run in o{nB) time in general. However, 
if one is willing to alter the algorithm itself, rather than just its implementation, one 
can obtain more significant speedups. Indeed, we can devise algorithms that satisfy 
both the Square Root and Bounded Waste Properties and yet run in time 0{n\ogB) 
or even 0{n) (although there will of course be a tradeoff between running time and 
the constants of proportionality on the expected waste). 

We shall first describe the general algorithmic approach and prove that algorithms 
that follow it will satisfy the two properties. We will then show how algorithms of this 
type can be implemented in the claimed running times. The key idea is to use data 
structures for each item size, as in the 0{nJ) implementation mentioned above, but 
only require that they be approximately correct (so that we need not spend so much 
time updating them). In particular, we maintain for each item size s a set of local 
values Np^s{h) for the counts Np{h), and only require these local counts satisfy 

\Np{h) - Np^s{h)\ < S (8.15) 

for some constant S. When an item of size s arrives, we place it so as to minimize 
sSs{P) = Yl,f=i ^P,s{hY: subject only to the additional constraint that we cannot 
place the item in a bin with local count 5 or less, since there is no guarantee that such 
bins exist. Let ApproxSSs be an algorithm that operates in this way. 

Lemma 8.7 Suppose F is a perfectly packable distribution with bin size B, P is a 
packing into bins of size B, 6 > 0, and x is an item randomly generated according to 
F. Then if x is packed according to ApproxSSs, the expected increase in ss{P) is at 
most 106 + 3. 



Proof. We first need a generalization of Claim |2.2.1| from the proof of Lemma ^]2| 
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Claim 8.7.1 Suppose F is a perfectly packable distribution with bin size B and 6 > 0. 
Then there is an algorithm Ap such for any packing P into bins of size B, if an item x 
is randomly generated according to F, Ap will pack x in such a way that x does not go 
in a bin with a level h for which Np{h) < 6 and yet for each level h with Np{h) > 6, 
1 < h < B — 1, the probability that Np{h) increases is no more than the probability 
that it decreases. 



This is proved by a simple modification of the proof of Claim p.2.1| to require that for 
each optimal bin the items are ordered so that all the levels Si through Siast{Y) have 
counts greater than S and none of the levels Siastiy) + s{yi) do for i > last{y). 

Claim ^.7.11 implies that the expected increase in ss{P) under Ap is at most 26 + 2: 
If a count greater than 6 changes, the proof of Lemma ^]2| implies that the expected 
increase in ss{P) is at most 1. Counts of 6 or less can only increase, but in this case 
ss{P) can increase by no more than 25+1. At most two counts can change during 
any item placement, and at most one of them can be a count of 6 or less. Thus the 
expected change in ss{P) obeys the claimed bound, and if SSs is the algorithm that 
places items so as to minimize ss{P) subject to the constraint that no item can be 
placed in a partially filled bin whose level's count is 6 or less, we can conclude that the 
expected increase in ss{P) when SSs places an item generated according to F is also 
at most 25 + 2. 

So consider what happens when SSg packs an item with size s G Up. Suppose that 
placement is into a bin of level h, and that Np{h + s) — Np{h) = d. Note that by 
Lemma the smallest increase in ss{P) this can represent is 2(i + 1. Now by (|8.15|) 
we must have Np^s{h + s) — Np^s{h) < d + 26 and so the move chosen by ApproxSSs 
must place the item in a bin of level h' satisfying Np^s{h' + s) — Np^s{h') < d + 26. But 
then, again by (|8.15|) , we must have Np{h' + s) — Np{h') < d + A6 and hence, again by 
Lemma |L^, ss{P) can increase by at most 2(i + 85 + 2, or at most 85 + 1 more than 
the increase under SSs. Since the expected value for the latter was at most 25 + 2, the 
Lemma follows. ■ 



Theorem 8.8 For any 5 > 0, 

(a) If F is a perfectly packable distribution, then EW^'^'^"^"^^^^ {F) = 0{y/n). 

(b) If F is a bounded waste distribution with no nontrivial dead-end levels, then 

E^^ApproxSSs(^p^ =0(1). 

(c) Suppose ApproxSS'g is the algorithm that mimics ApproxSSs except that it never 
creates a bin that, based on the item sizes seen so far, has a dead-end level, 
unless this is unavoidable, in which case it starts a new bin. Then this algo- 
rithm has EWn^^^"^^^^{F) = 0(1) for all bounded waste distributions, as well as 
EWn''''^"^^^'' (F) = 0{y/n) for all perfectly packable distributions. 
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Proof. Note that for any fixed 6, 105 + 6 is a constant, and having a constant bound on 
the expected increase in ss{P) was really all we needed to prove the above results for 
SS and SS'. Thus the above three claims all follow by essentially the same arguments 
we used for SS and 5*5", with constants increased appropriately to compensate for 
property ( ^.15| ). ■ 

Let us now turn to questions of running time. 

Lemma 8.9 Suppose t > 1 and J > 1 are integers. Then there are implementations 
of ApproxSStj and ApproxSS[j that work for all instances with J or fewer item sizes 
and run in time 0{n{l + {\ogB)/t)). 

Proof. We shall describe an implementation for ApproxSStj- The implementation 
for ApproxS S[j is almost identical except for the requirement that we keep track of 
the dead-end levels and avoid creating bins with those levels when possible, which we 



already discussed in Section 3.2 



Our implementations maintain a data structure for each item size s encountered, the 
data structure being initialized when the size is first encountered. We are unfortunately 
unable to use the list-of-list data structure involved in the implementation of SS*, since 
the efficiency of that data structure relied on the fact that counts could only change 
by 1 when they were updated. Now they may change by as much as tJ. Therefore we 
use a standard priority queue for the up to B possible levels h of bins into which an 
item of size s might be placed. Here the "possible levels" for s are together with all 
those h such that h + s < B and Np^s{h) > tJ. The levels are ranked by the increase 
in sSs{P) that would result if an item of size s were packed in a bin of level h. We 
can use any standard priority queue implementation that takes 0(1) time to identify 
an element with minimum rank and 0{\ogB) to delete or insert an element. Initially, 
the only element in each priority queue is the one for level 0, i.e., the representative 
for starting a new bin. 

When we pack an item of size s, we first identify the "best" level h for it as specified 
by the priority queue for s. We then place x in a bin of level h and update the 
global counts Np{h) and Np{h + s). This all takes 0(1) time. Local counts are not 
immediately changed when an item is packed. Local count updates are performed more 
sporadically, and initiated as follows. We maintain a counter c{h) for each level h. This 
counter is incremented by 1 every time Np{h) changes and reset to 1 whenever it reaches 
the value tJ + 1. Suppose the item sizes seen so far are si, S2, ■ ■ ■ , Sj, j < J. The local 
count Np^Si{h) is updated only when the new value of c{h) satisfies c{h) = 0(mod t) 
and i = c{h)/t. Note that this means that Np{h) changes only tJ times between any 
two updatings of Np^g.^h) and so (|8.15|) is satisfied for 6 = tJ. 

Whenever Np^s{h) is updated, we make up to two changes in the priority queue 
for s, each of which involves one or two insertions/deletions and hence takes O(log-B) 
time: First, ii h + s < B we may need to update the priority queue entry for h. If h 
is in the queue but now Np^s{h) < tJ, then we must delete it from the queue. If it is 
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not in the queue but now Np^s{h) > tJ we must insert it. Finally, if it is in the queue 
and Np^s{h) > tJ, but its rank is not the correct value (with respect to Np_s{h) and 
Np^s{h + s)), then it must be deleted and reinserted with the correct value. Similarly, 
if /i — s > 0, then we may have to update the entry for h — s. 

It is easy to verify that the above correctly implements ApproxSStj- The overall 
running time is 0{n) for packing and updating the true counts Np[h) and 0{{n/t) logB) 
for updating local counts and priority queues, as required. ■ 



Theorem 8.10 There exist algorithms AISS , A2SS , AISS' and A2SS' such that 

(a) All four satisfy the Square Root and Bounded Waste Properties. 

(b) AISS' and A2SS' have bounded expected waste for all bounded waste distribu- 
tions. 

(c) AISS and AISS' run in time 0{nlogB) . 
(c) A2SS and A2SS' run in time 0{n). 



Proof. Given Theorem |8.8| , it is easy to get algorithms with the above properties 
from Lemma |8.9| assuming we know J in advance: If we take t = 1 we get running 
time 0{n\ogB) and if we take t = logB we get running time 0{n). (The tradeoffs 
only involve the constants of proportionality on the expected waste.) Moreover, it is 
really not necessary to know J in advance, as there are adaptive algorithms that learn 
J in the process of constructing their packings, still run in time 0(n log -B) or 0{n), 
and have the desired average case performance. For instance, we can start by running 
ApproxSS^ {ApproxS S 5 log b) as long as the number J of item sizes seen so far is no 
more than 5. Thereafter, whenever we see a new item size, we close all partially filled 
bins, start running ApproxSSj+i {ApproxSS(^j+i)iogB), and then set J = J + 1. Since 
by the analysis used in Section 3.2| the expected number of items packed before all item 
sizes have been seen must be bounded by a constant for any F, the bins constructed 
before we start running the correct algorithm contain only bounded expected waste 
and so cannot endanger our conclusions about asymptotic expected waste rates. ■ 

We can also devise fast analogues of Section distribution-specific algorithms SS^ 
that always have ER^{F) = 1, even for distributions whose optimal expected waste is 
linear. This however involves more than just applying the approximate data structures 
described above. The 0{nB) running times for the SS^ algorithms derive from two 
sources, only one of which (the need for Q{B) time to pack an item) is eliminated by 
using the approximate data structures. The second source of Q{nB) time is the need 
to possibly pack Q{nB) imaginary items of size 1. 

To avoid this obstacle, we need an additional idea. Recall that SS^ attains 
ER^ (F) = 1 by simulating the application of SS to a perfectly packable distri- 
bution F' derived from F. The modified distribution F' was constructed using the 
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optimal value c{F) for the linear program of Section |^. Distribution F' was equiva- 
lent to generating items according to F with probability 1/(1 + c{F)) and otherwise 
generating an (imaginary) item of size 1. 

Our new approach uses more information from the solution to the LP. Let v{j, h), 
1 < j < J and 0<h<B — l,he the variable values in an optimal solution for the LP 
for F. For l<h<B -1 define 

J J 

Aft = ^v{3,h- Sj) -^v{j, h). 

3=1 j=l 

Note that Ah is essentially the percentage of partially filled bins in an optimal packing 
whose gap is of size B — h. Let T = X]h=i^ and note that we must have T < 1. 
Our new algorithm uses SS to pack the modified distribution F" obtained as follows. 
With probability 1/(1 + T) we generate items according to the original distribution F. 
Otherwise (with probability T/(l + T)) we generate "imaginary" items according to 
the distribution in which items of size s have probability Ab-s- It is not difficult to 
show that this is a perfectly packable distribution and that the expected total size of 
the imaginary items is c{F), as in SS^. Now, however, the number of imaginary items 
is bounded by n, so the time for packing them is no more than that for packing the 
real items, and hence can be 0{n\ogB) or 0{n) as needed. 

One can construct a learning algorithm SS** based on these variants just as we 
constructed the learning algorithm SS* based on the original SS^ algorithms. We 
conjecture that SS** will satisfy the same general conclusions as listed for SS* in 
Theorem B?^. The proof will be somewhat more complicated, however, and so we leave 



the details to interested readers. 

We should note before concluding the discussion of fast variants of SS that our 
results on this topic are probably of theoretical interest only. A complicated 0{n\ogB) 
algorithm like ApproxSSj would be preferable to an 0{nB) or 0{nJ) implementation 
of SS only when J is fairly large, presumably well over 100. However, the constants 
involved in the expected waste produced by ApproxSSj are substantial in this case. 

For instance, consider the bounded waste distribution [/{400, 1000}. For n = 
100,000, ApproxSSioo typically uses 100,000 bins, i.e., one per item and roughly 5 



times the optimal number, even though Theorem ^.10] says that the expected waste is 
asymptotically 0(1). On the other hand. Best Fit, which also runs in time 0{n\ogB) 
but is conjectured to have linear expected waste for this distribution, uses roughly 0.3% 
more bins than necessary. {SS uses roughly 0.25%.) Things have improved by the time 
n = 10, 000, 000, but not enough to change the ordering of algorithms. Now ApproxSSj 
uses only roughly 9.8% more bins than necessary, while Best Fit uses roughly 0.28%. 
SS is down to an average excess of 0.0025%. This consists of roughly 50 excess bins 
(as compared to 45 for n = 100, 000) and should be compared to the roughly 200,000 
excess bins for ApproxSSj. Admittedly the latter algorithm could be modified to sig- 
nificantly lower its expected waste, but it is unlikely that it could be made competitive 
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with Best Fit except for much larger values of n. 



9 Conclusions and Open Questions 

In this paper we have discussed a collection of new, nonstandard, and surprisingly 
effective algorithms for the classical one-dimensional bin packing problem. We have 
done our best to leave as few major open problems as possible, but several interesting 
ones do remain: 

• Can SS itself be implemented to run in time o{nB), so that we aren't forced to 
use the approximate versions described in the previous section? 

• What is max{_E'_R^(F) : F is a discrete distribution}? The results of Section 4 
only show that this maximum is at least 1.5 and no more than 3.0. A related 
question is what is the asymptotic worst-case performance ratio for SS. Here the 



results of |vV92] for arbitrary on-line algorithms imply a lower bound of 1.54, 



but the best upper bound is still the abovementioned 3.0. 

• Is our conjecture correct that SrS satisfies both the Square Root and Bounded 
Waste Properties for all r > 1? Is there any polynomial-time algorithm that sat- 
isfies the Sublinearity Property and does not involve at least implicitly computing 
the powers of counts? 

• Can one obtain a meaningful theoretical analysis of the constants of proportion- 
ality involved in the expected waste rates for particular distributions and the 
various bin packing algorithms we have discussed? Empirically we have observed 
wide differences in these constants for algorithms that, for example, both have 
bounded expected waste for a given distribution F, so theoretical insights here 
may well be of practical value. 

• Is there an effective way to extend the Sum-of-Squares approach to continuous 
distributions while preserving its ability to get sublinear waste when the optimal 
waste is sublinear? 

Finally, there is the question of the extent to which approaches like that embodied 
in the Sum-of-Squares algorithm can be applied to other problems. A first step in 



this direction is the adaptation of SS to the bin covering problem in [|CJK01|] . In bin 
covering we are given a set of items and a bin capacity B, and must assign the items 
to bins so that each bin receives items whose total size is at least B and the number of 
bins packed is maximized. Here "waste" is the total excess over B in the bins and the 
class of "perfectly packable distributions" is the same as for ordinary bin packing. The 
interesting challenge here becomes to construct algorithms that have good worst- and 
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average-case behavior for distributions that aren't perfectly packable, while still having 
0{y/n) expected waste for perfectly packable distributions. For details, see [PJKOI . 



The results for bin covering suggest that the Sum-of-Squares approach may be 
more widely applicable, but bin covering is still quite close to the original bin packing 
problem. Can the Sum-of-Squares approach (or something like it) be extended to 
problems a bit further away? 
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